Ray and its growing ecosystem
Richard Liaw, Anyscale, @richliaw
Travis Addair, Uber, @TravisAddair
Overview of talk
● Overview of Ray
● Ray’s ecosystem integrations
● Uber Open Source + Ray
What is Ray?
3
Mission: Simplify distributed computing.
Origin of Ray
What is Ray?
Relation to Ecosystem
● Similar in nature: Dask, Celery, Erlang, Akka, gRPC
● Runs on top of AWS, GCP, Azure, Kubernetes, your laptop, …
● Compatible with Python ecosystem:
○ NumPy
○ Pandas
○ TensorFlow
○ PyTorch
○ SpaCy, …
Why Ray?
Distributed Systems are not New
HPC
(1980s)
Web
(1990s)
Big Data
(2000s)
Deep
Learning
(2010s)
No Longer Isolated Workloads
HPC
Deep Learning
Microservices
Big Data
No Longer Isolated Workloads
Big Data
Microservices
Deep Learning
HPC
No Longer Isolated Workloads
Big Data
Deep Learning
MicroservicesHPC
No Longer Isolated Workloads
Deep Learning
MicroservicesHPC
Big Data
No Longer Isolated Workloads
Deep Learning
MicroservicesHPC
Big Data
?
No Longer Isolated Workloads
Ray API
API
Functions -> Tasks
def read_array(file):
# read array “a” from “file”
return a
def add(a, b):
return np.add(a, b)
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id1
read_array
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id1
read_array
id2
zerosread_array
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
id1
read_array
id2
zerosread_array
id3
add
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2); ray.get(id3)
id1
read_array
id2
zerosread_array
id3
add
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id4 = c.inc.remote()
id5 = c.inc.remote()
ray.get([id4, id5])
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote(num_gpus=1)
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote(num_gpus=1)
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id4 = c.inc.remote()
id5 = c.inc.remote()
ray.get([id4, id5])
Ecosystem
Native Libraries Third Party Libraries
Ecosystem
Native Libraries Third Party Libraries
ML Training
Libraries
Ecosystem
Native Libraries Third Party Libraries
ML Training
Libraries
Ecosystem
Native Libraries Third Party Libraries
Auto ML
Libraries
Ecosystem
Native Libraries Third Party Libraries
Cloud ML
Platforms
Ecosystem
Native Libraries Third Party Libraries
Ray becoming the go-to framework for scaling
libraries
Ray at Uber
Travis Addair, Uber
Ecosystem - Horovod
- Fast and easy distributed training for any framework
- Run Horovod on Ray
- Any cloud provider or k8s with Ray cluster launcher
- Hyperparameter search integration for Horovod
- Benefits of ecosystem (data processing, serving)
- Integration released in Horovod 0.20
- ~400 lines of code
Ecosystem - Ludwig
- Code-free deep learning (Auto ML)
- Given inputs and outputs, Ludwig builds the right model for any task
Ludwig: local mode
Ludwig: scalability challenges
- Single worker for preprocessing
- Whole dataset must fit in-memory (Pandas)
- Hyperparameter Optimization
- Optimize over preprocessing (feature engineering)
- Optimize over model params
- Optimizer over model architecture (encoders / decoders)
Ludwig: conventional ML workflow
Challenges with ML workflows
- Rewrite major sections of the code:
- Pandas -> Spark transformers
- Maintain two distinct code paths
- Each step heavyweight, allocates heterogenous infra
- Airflow
- But what about hyperparameter optimization?
- Dynamic process
- Difficult to model using static workflow definitions
Examining the Ray ecosystem
Dask
- Drop-in replacement for Pandas
- Pure-Python data processing (low overhead, easy debugging)
- GPU acceleration with RAPIDS / cuDF
Horovod
- Framework agnostic distributed training (TensorFlow, PyTorch, MXNet)
- Supports fault tolerance and auto-scaling
- Flexible: no restrictions on the structure of the training code
Ray
- Brings everything together as a single infra layer
- Provides scalable hyperparameter optimization and serving natively
Ludwig on Ray
Ludwig on Ray Tune
Ludwig on Ray Serve
Ludwig on Ray Serve
Looking forward: minimizing I/O
Looking forward: minimizing I/O
Check us out on GitHub
Horovod:
- https://horovod.ai/
- https://github.com/horovod/horovod
Ludwig:
- https://ludwig.ai/
- https://github.com/uber/ludwig
Getting Involved
Things you can do now
pip install ray
Join Ray Slack https://rb.gy/fntume
Browse docs: docs.ray.io

Ray and Its Growing Ecosystem

  • 1.
    Ray and itsgrowing ecosystem Richard Liaw, Anyscale, @richliaw Travis Addair, Uber, @TravisAddair
  • 2.
    Overview of talk ●Overview of Ray ● Ray’s ecosystem integrations ● Uber Open Source + Ray
  • 3.
    What is Ray? 3 Mission:Simplify distributed computing.
  • 4.
  • 5.
  • 6.
    Relation to Ecosystem ●Similar in nature: Dask, Celery, Erlang, Akka, gRPC ● Runs on top of AWS, GCP, Azure, Kubernetes, your laptop, … ● Compatible with Python ecosystem: ○ NumPy ○ Pandas ○ TensorFlow ○ PyTorch ○ SpaCy, …
  • 7.
  • 8.
    Distributed Systems arenot New HPC (1980s) Web (1990s) Big Data (2000s) Deep Learning (2010s)
  • 9.
    No Longer IsolatedWorkloads HPC Deep Learning Microservices Big Data
  • 10.
    No Longer IsolatedWorkloads Big Data Microservices Deep Learning HPC
  • 11.
    No Longer IsolatedWorkloads Big Data Deep Learning MicroservicesHPC
  • 12.
    No Longer IsolatedWorkloads Deep Learning MicroservicesHPC Big Data
  • 13.
    No Longer IsolatedWorkloads Deep Learning MicroservicesHPC Big Data ?
  • 14.
  • 15.
  • 16.
    API Functions -> Tasks defread_array(file): # read array “a” from “file” return a def add(a, b): return np.add(a, b)
  • 17.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b)
  • 18.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id1 read_array
  • 19.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id1 read_array id2 zerosread_array
  • 20.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) id1 read_array id2 zerosread_array id3 add
  • 21.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2); ray.get(id3) id1 read_array id2 zerosread_array id3 add
  • 22.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors
  • 23.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value
  • 24.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter.remote() id4 = c.inc.remote() id5 = c.inc.remote() ray.get([id4, id5])
  • 25.
    API Functions -> Tasks @ray.remote defread_array(file): # read array “a” from “file” return a @ray.remote(num_gpus=1) def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote(num_gpus=1) class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter.remote() id4 = c.inc.remote() id5 = c.inc.remote() ray.get([id4, id5])
  • 26.
  • 27.
    Ecosystem Native Libraries ThirdParty Libraries ML Training Libraries
  • 28.
    Ecosystem Native Libraries ThirdParty Libraries ML Training Libraries
  • 29.
    Ecosystem Native Libraries ThirdParty Libraries Auto ML Libraries
  • 30.
    Ecosystem Native Libraries ThirdParty Libraries Cloud ML Platforms
  • 31.
    Ecosystem Native Libraries ThirdParty Libraries Ray becoming the go-to framework for scaling libraries
  • 32.
    Ray at Uber TravisAddair, Uber
  • 33.
    Ecosystem - Horovod -Fast and easy distributed training for any framework - Run Horovod on Ray - Any cloud provider or k8s with Ray cluster launcher - Hyperparameter search integration for Horovod - Benefits of ecosystem (data processing, serving) - Integration released in Horovod 0.20 - ~400 lines of code
  • 34.
    Ecosystem - Ludwig -Code-free deep learning (Auto ML) - Given inputs and outputs, Ludwig builds the right model for any task
  • 35.
  • 36.
    Ludwig: scalability challenges -Single worker for preprocessing - Whole dataset must fit in-memory (Pandas) - Hyperparameter Optimization - Optimize over preprocessing (feature engineering) - Optimize over model params - Optimizer over model architecture (encoders / decoders)
  • 37.
  • 38.
    Challenges with MLworkflows - Rewrite major sections of the code: - Pandas -> Spark transformers - Maintain two distinct code paths - Each step heavyweight, allocates heterogenous infra - Airflow - But what about hyperparameter optimization? - Dynamic process - Difficult to model using static workflow definitions
  • 39.
    Examining the Rayecosystem Dask - Drop-in replacement for Pandas - Pure-Python data processing (low overhead, easy debugging) - GPU acceleration with RAPIDS / cuDF Horovod - Framework agnostic distributed training (TensorFlow, PyTorch, MXNet) - Supports fault tolerance and auto-scaling - Flexible: no restrictions on the structure of the training code Ray - Brings everything together as a single infra layer - Provides scalable hyperparameter optimization and serving natively
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    Check us outon GitHub Horovod: - https://horovod.ai/ - https://github.com/horovod/horovod Ludwig: - https://ludwig.ai/ - https://github.com/uber/ludwig
  • 47.
    Getting Involved Things youcan do now pip install ray Join Ray Slack https://rb.gy/fntume Browse docs: docs.ray.io