Ray The alternative to distributed frameworks.pdf

Ray: The alternative to
distributed frameworks
李泓旻(Andrew Li)

2
About me
- Data Engineer
@Data Science & Technology, Cathay Financial Holdings
- Former one-stop engineer for data science(Manufacturing)
- Former Chemical Engineer
- Polymer material, Genetic engineering, Bacterial fermentation
- D4SG (Data for Social Good) #4, winter 2018
- First prize, Genius For Home competition, MediaTek, 2018
- : orcahmlee

6
Four Reasons Why Leading Companies Are Betting On Ray, Anyscale
How Ray’s ecosystem powers Spotify’s ML scientists and engineers

10
Four Reasons Why Leading Companies Are Betting On Ray, Anyscale
How Ray’s ecosystem powers Spotify’s ML scientists and engineers

14
Ray Tune:
Tuning with your favorite
ML framework

15
Ray Tune: Tuning with your favorite framework
and more......

16

17

18

19

20

21
search_optimization Algorithm
"random" (Random Search)
"bayesian" SkoptSearch
"hyperopt" HyperOptSearch
"bohb" TuneBOHB
"optuna" Optuna

22
Modin:
A drop-in replacement for
pandas

Modin: A drop-in replacement for pandas
23

Modin: A drop-in replacement for pandas
24

Modin: Architecture
25
Modin: Architecture

pandas API coverage
26
Modin vs. Dask DataFrame vs. Koalas

- Dask DataFrame and Koalas
- Lazy execution
- Support row-oriented partitioning and parallelism
- Modin
- Eager execution
- Support row, column, and cell-oriented partitioning
and parallelism
27

Decomposition
28
Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System

- Dask DataFrame and Koalas
- Lazy execution
- Support row-oriented partitioning and parallelism
- Modin
- Eager execution
- Support row, column, and cell-oriented partitioning
and parallelism
- If the API is not supported yet, it is being executed
in the default to pandas mode
29

default to pandas
30
Defaulting to pandas

Supported APIs
- pd.DataFrame
- Y: iloc, T, all, any, quantile, apply, applymap……
- D: plot, to_parquet, to_pickle, to_json……
- pd.Series
- Y: iloc, T, all, any, quantile, apply, value_counts, to_frame……
- D: plot, to_parquet, to_pickle, to_json……
- pd.read_<file>
- Y: read_csv, read_parquet……
- D: read_pickle, read_html……
- Utilities
- Y: pd.concat, pd.unique, pd.get_dummies……
- D: pd.cut, pd.to_datetime, pd.to_numeric……
31
Supported APIs

Actor, Stateful
Task, Stateless
Programming model
35
Fire and Forget, AIM-120 AMRAAM

Actor Model
- What is Actor Model and why to use it
- Related languages/frameworks implements Actor Model:
- Erlang, RabbitMQ, Akka
- Super useful references:
- https:>/blog.techbridge.cc/2019/06/21/actor-model-in-web/
- [COSCUP 2011] Programming for the Future, Introduction to the
Actor Model and Akka Framework
36

Programming model
39
Ray: A Distributed Framework for Emerging AI Applications

Programming model
40

Architecture
43

Architecture - Application Layer
44

Architecture - System Layer
The system layer consists of three major components
- Global Control Store(GCS)
- Bottom-Up Distributed Scheduler
- In-Memory Distributed Object Store
45

Global Control Store
47

48
- Maintains fault tolerance and low latency
- Enables every components in the system to be
stateless
- Key-value store with pub-sub functionality
- < v1.11.0: Using Redis
- >=v1.11.0: No longer starts Redis as default

Global Control Store (< v1.11.0)
49
Redis in Ray: Past and future

Global Control Store (>=v1.11.0)
50
Redis in Ray: Past and future

51
- Maintains fault tolerance and low latency
- Enables every components in the system to be
stateless
- Key-value store with pub-sub functionality
- < v1.11.0: Using Redis
- >=v1.11.0: No longer starts Redis as default

Fault tolerance
- Decouple the durable lineage storage from other
system components
- Heartbeat table, Job table, Actor table
52

Low latency
- Centralized scheduler couple task scheduling and task
dispatch(Dask, Spark, CIEL)
- Involving the scheduler in each object transfer is
prohibitively expensive
- Ray store the object’s metadata in GCS rather than in
the scheduler, fully decoupling task dispatch from
task scheduling
53

Bottom-Up
Distributed Scheduler
54

Bottom-Up Distributed Scheduler
55

Existing cluster computing frameworks:
- Centralized schedulers: provide locality but at latencies
in the tens of ms(Spark, CIEL, Dryed)
- Distributed schedulers: can achieve high scale, but they
either don’t consider data locality(work stealing), or
assume tasks belong to independent jobs(Sparrow), or
assume the computation graph is known(Canary)
56

57

In-Memory Distributed
Object Store
58

In-Memory Distributed Object Store
59

- Plasma: A High-Performance Shared-Memory Object Store
- Plasma was initially developed as part of Ray that is
being developed as part of Apache Arrow
- On each node, Ray implement the object store via
shared memory. This allows zero-copy data sharing
between tasks running on the same node
- Plasma holds immutable objects in shared memory
60

- To minimize task latency, Plasma is used to store the
inputs and outputs of every task, or stateless
computation.
- For low latency, Ray keep objects entirely in memory
and evict them as needed to disk using an LRU policy
- Small objects(<100 KiB): store in in-process object store
- Large objects: store in shared memory object store
61

62

63

Object spilling and persistence
- Spilling objects to external storage once the capacity
of the object store is used up(v1.3+)
- Two types of external storage supported by default
- For local storage, the OS would run out of inodes very
quickly. If objects are smaller than 100MB, Ray fuses
objects into a single file to avoid this problem
64

65
Fault Tolerance
- Ray recovers any needed objects through lineage
re-execution. The lineage stored in the GCS tracks
both stateless tasks and stateful actors during
initial execution

Ray Cluster on GCP/AWS/Azure
67
VM VM VM

Ray Cluster on K8s
68
POD POD
POD

Handling Dependencies
70
Source

Ray The alternative to distributed frameworks.pdf

Recommended

Recommended

More Related Content

Similar to Ray The alternative to distributed frameworks.pdf

Similar to Ray The alternative to distributed frameworks.pdf (20)

Recently uploaded

Recently uploaded (20)

Ray The alternative to distributed frameworks.pdf