SlideShare a Scribd company logo
Distributed computing and
hyper-parameter tuning with Ray
Jan Margeta | |
November 17, 2018
jan@kardio.me @jmargeta
Healthier hearts Waste reduction Failure prevention
Hi, I am Jan
Computer vision and machine learning
Pythonista since 2.5+
Founder of KardioMe
Martin Fowler's First rule of
distributed objects computing
Don't
Massive complexity booster
See also Common fallacies of distributed computing
The world is
concurrent
Towards real-time decisions
AND ALSO
Resilience cannot be achieved with a single machine
Machine learning workflows often need heterogeneous
HW and intensive computations
Need to scale up and down on demand
ImageNet in 224 seconds
Concurrency and
parallelism in Python
Threads, processes, distributed, Dask, Celery, PySpark,
async…
3D printed model of your own heart
CT or MRI
image
segmentpreprocess
landmark
estimation
meshing
view
estimation
VR
L
L
P
P S
M
V
3D print
M
GPU-based
machine learning
CPU-intensive
operation
WebVR-based UI
-
Long runing
external process
Cookie quality control
OK
Acquisition
VisualisationProcessing
PySpark
mature, excellent for ETL, simple queries
"BigData" ecosystem in Java
better for homogeneous processing of the points
R = matrix(rand(M, F)) * matrix(rand(U, F).T)
ms = matrix(rand(M, F))
us = matrix(rand(U, F))
Rb = sc.broadcast(R)
msb = sc.broadcast(ms)
usb = sc.broadcast(us)
for i in range(ITERATIONS):
ms = sc.parallelize(range(M), partitions) 
.map(lambda x: update(x, usb.value, Rb.value)) 
.collect()
ms = matrix(np.array(ms)[:, :, 0])
…
https://github.com/joost-de-vries/spark-sbt-seed/blob/master/src/main/python/als.py
Spark barriers vs dynamic task graphs
Ray: A Distributed Execution Framework for Emerging AI Applications Michael Jordan (UC
Berkeley)
Celery
computations defined beforehand
mature, support for retries, rate limiting…
group, chain, chord, map, starmap, chunks…
from celery import Celery
app = Celery('jobs', ...)
@app.task
def compute_stuff(x, y):
return x + y
@app.task
def another_compute_stuff(x, y):
return x + y
from jobs import compute_stuff, another_compute_stuff
compute_stuff.delay(1, 1).get()
compute_stuff.apply_async((2, 2), link=another_compute_stuff.s(16))
compute_stuff.starmap([(2, 2), (4, 4)])
http://docs.celeryproject.org/en/master/userguide/canvas.html
Dask
way more Pythonic than Spark
collections that play well with Python ecosystem
pickle, cloudpickle, msgpack, and custom numpy
global scheduler
https://dask.org/
import dask
@dask.delayed
def add(x, y):
return x + y
x = add(1, 2)
y = add(x, 3)
y.compute()
Requirements
dynamic tasks with stateful computation
play well with existing ML tools in Python
heterogeneous code and hardware
fast with low latency
fault tolerant (node failure / addition / removal)
scale from multiple cores to multiple nodes
Ray
Ray is a general purpose framework for doing parallel
and distributed Python along with a collection of libraries
targeting machine data processing workflows.
Developed at UC Berkeley as an attempt to replace Spark
https://github.com/ray-project/ray
Unique components
Clean API
Stateless tasks and actors combined
Bottom-up scheduling
Shared object store with zero copy deserialization
Most* of Ray's API
you will ever need
The rest is (mostly) Python as we know it
*Seriously, this is pretty much it
ray.init # connect to a Ray cluster
ray.remote # declare a task/actor & remote execution
ray.get # retrieve a Ray object and convert to a Python object
ray.put # manually place an object to the object store
ray.wait # retrieve results as they are made ready
Tasks
Create a task & schedule it throughout the cluster
@ray.remote
def imread(fname):
return cv2.imread(fname)
@ray.remote(num_cpus=1, num_gpus=0)
def threshold(image, threshold=128):
return image > threshold
# Immediately returns future
future0 = imread.remote('python.png')
future1 = threshold.remote(np.ones((224, 224)))
futures = [imread.remote(f) for f in glob('*.png')]
Actors
A solution for mutable state
Instantiate the parameter server somewhere on the
cluster
@ray.remote
class ParameterServer(object):
def __init__(self, keys, values):
values = [value.copy() for value in values]
self.weights = dict(zip(keys, values))
def push(self, keys, values):
for key, value in zip(keys, values):
self.weights[key] += value
def pull(self, keys):
return [self.weights[key] for key in keys]
A single worker
@ray.remote
def worker(ps):
while True:
# Get the latest parameters
weights = ray.get(ps.pull.remote(keys))
# Compute an update of the params
# (e.g. the gradients for neural nets)
# Push the updates to the parameter server
ps.push.remote(keys, gradients)
ps = ParameterServer.remote(keys, initial_values)
worker_tasks = [worker.remote(ps) for _ in range(10)]
Actors not only for storing
machine learning parameters
Note that pyhikvision is our custom wrapper to a vendor-specific library in Cython (ray works!)
When interfacing with cameras, consider the vendor agnostic and open-source .
@ray.remote
class Camera:
def __init__(self, mac):
self.cam = pyhikvision.Camera(mac=mac)
self.cam.open()
self.num_frames = 0
def grab(self):
self.num_frames += 1
return self.cam.grab_frame()
def total_frames(self):
return self.num_frames
cam = Camera.remote(mac='xxxxxx')
harverster
Actors need no
locks for mutation!
Actor methods always called one by one
future0 = c.grab.remote()
future1 = c.total_frames.remote()
future2 = c.grab.remote()
Get the results
This blocks until the future is done
All subsequent calls to ray.get return almost instantly
Reuse the futures
@ray.remote
def heavy_computation():
time.sleep(10)
return np.zeros((224, 224))
arr = ray.get(future)
arr0 = ray.get(future)
arr1 = ray.get(future)
thumb_future = make_a_thumbnail.remote(future)
landmarks_future = find_landmarks.remote(future)
Create
computational graph
Actors and remote functions interoperate seamlessly
Benefits of both stateless dataflow and actor
frameworks
Function can take values, futures, or even actor
handles as params
frame_id = camera.grab.remote()
thresholded_id = threshold.remote(frame_id)
thresholded = ray.get(thresholded_id)
Define by run JIT
import numpy as np
@ray.remote
def aggregate_data(x, y):
return x + y
data = [np.random.normal(size=1000) for i in range(4)]
while len(data) > 1:
intermediate_result = aggregate_data.remote(data[0], data[1])
data = data[2:] + [intermediate_result]
result = ray.get(data[0])
https://ray-project.github.io/2017/05/20/announcing-ray.html
Architecture
P. Moritz, R. Nishihara, et al.: Ray: A Distributed Framework for Emerging AI Applications
Worker & driver
Receive and execute tasks
Submit tasks to other workers
Driver is not assigned tasks for execution
Plasma - Shared
memory object store
share objects across local processes
in-memory key-value object store
data = ['Hello PyConBalkan', 4, (5, 5), np.ones((128, 128))]
key = ray.put(data)
deserialized = ray.get(key)
Apache Arrow serialization
Standard objects
Numpy arrays
See https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html
Local scheduler
driver can assign a task to a worker
bottom up scheduling
fractional resources
no more tasks in parallel than the number of CPUs
(multithreaded libs - restrict the number of threads...)
Global control state
take all metadata and state out of the system
centralize it in a redis cluster
everything else is largely stateless now
Global scheduler
reschedule tasks on other machines
Fault-tolerance
Failover to other nodes based on
the global control state
non actors
lineage base - rerun the tasks to reconstruct
actors (in the future)
recreate actor from the beginning
Does it scale?
Још видео снимака
mujoco video
Гледајте касније Дели
0:01 / 0:40
Moritz, Nishihara et al.: Ray: A Distributed Framework
for Emerging AI Applications
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
On-prem cluster
Start head
Start nodes with workers
Connect and run commands
Teardown - stop ray process
ray start --head --redis-port=6379
ray start --redis-address=192.168.1.5:6379 # head IP: 192.168.1.5
ray.init(redis_address="192.168.1.5:6379")
@ray.remote
def imread(filename):
return cv2.imread(filename)
ims = ray.get([imread.remote(f) for f in glob('*.png')])
ray stop
On the cloud
Ready-made auto-scaling scripts for AWS and GCP
Create a cluster
Destroy
or write a custom provider
ray up ray/python/ray/autoscaler/aws/example-full.yaml
ray down ray/python/ray/autoscaler/aws/example-full.yaml
https://ray.readthedocs.io/en/latest/using-ray-on-a-large-cluster.html
Developing with Ray
Testing
usually trivial - in → out well defined
Debugging
webUI
breakpoint() or ipdb.set_trace()
Higher level libs built
on top of Ray
Tune
rllib
modin
distributed linear algebra
…
Model hyper-
parameter tuning
Function-based API
A good idea to extract all traning params anyway
def my_tunable_function(config, reporter):
train_data, self.test_data = make_data_loaders(config)
model = make_model(config)
trainer = make_optimizer(model, config)
for epoch in range(10): # Could be an infinite loop too
train(model, trainer, train_data)
accuracy = evaluate(model, test_data)
reporter(mean_accuracy=accuracy)
Class-based API
class MyTunableClass(Trainable):
def _setup(self, config):
self.train_data, self.test_data = make_data_loaders(config)
self.model = make_model(config)
self.trainer = make_optimizer(model, config)
def _train(self):
train_for_a_while(self.model, self.train_data, self.trainer)
return {"mean_accuracy": eval_model(self.model, self.test_data)}
def _save(self, checkpoint_dir):
return save_model(self.model, checkpoint_dir)
def _restore(self, checkpoint_path):
self.model.load_state_dict(checkpoint_path)
Experiment config
experiment_spec = Experiment(
"experiment_name",
my_tunable_function_or_class,
stop={"mean_accuracy": 98.5},
config={
"learning_rate": tune.grid_search([0.001, 0.01, 0.1]),
"regularization": lambda x: 10 * np.random.rand(1),
},
trial_resources={
"cpu": 1,
"gpu": 0
},
num_samples=10
)
run_experiments(experiments=experiment_spec)
Compare the models
with Tensorboard
Reinforcement
learning
https://gym.openai.com/ https://ray.readthedocs.io/en/latest/rllib.html
Wrapping OpenAI gym
environments in actors
import gym
@ray.remote
class Simulator:
def __init__(self):
self.env = gym.make("SpaceInvaders-v0")
self.env.reset()
def step(self, action):
return self.env.step(action)
simulator = Simulator.remote()
# Take actions in the simulator
observations = []
observations.append(simulator.step.remote(0))
observations.append(simulator.step.remote(1))
Remote arrays and
distributed linear
algebra
import ray
from ray.experimental.array.distributed import linalg, random
ray.init()
arr = random.normal.remote((200, 200))
decomposed = linalg.qr.remote(arr)
orthogonal_da, triangular_da = ray.get(decomposed)
orthogonal, triangular = orthogonal_da.assemble(), triangular_da.assemble
Speed-up your
Pandas pipelines
import modin.pandas as pd
https://github.com/modin-project/modin
Conclusion
A little teaser of Ray
Build and scale your ML and other tools
Systems that adapt, learn online
Even locally as an alternative to threads and
processes
Check out Ray's fantatic tutorials
pip install ray
Thanks!
Distributed computing and
hyper-parameter tuning with Ray
Jan Margeta | |
November 17, 2018
jan@kardio.me @jmargeta
Read more
Butcher - Seven concurrency models in seven weeks
A note on distributed computing - Waldo J. et al.
Herb sutter - Free lunch is over
Fallacies of distrib. computing explained - Rotem-Gal-
Oz
Fallacies of distrib. computing - P. Deutsch
Ray docs
Ray tutorial
Plasma store
Plasma store and Arrow
Scaling Python modules witih ray framework
Read more
Ray - a cluster computing engine for reinforcement
learning applictions
https://ray-project.github.io/2018/07/15/parameter-
server-in-fifteen-lines.html
Robert Nishihara - Ray: A Distributed Execution
Framework for AI | SciPy 2018
M. Rocklin - Dask and Celery
Dask comparison to Spark
Ray: A Distributed System for AI
Resources

More Related Content

What's hot

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Big Data Spain
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
Cluster Schedulers
Cluster SchedulersCluster Schedulers
Cluster Schedulers
Pietro Michiardi
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
Intel Nervana
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshop
odsc
 
REST and JAX-RS
REST and JAX-RSREST and JAX-RS
REST and JAX-RS
Guy Nir
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NL
Arie Leeuwesteijn
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
MLconf
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
Chris Fregly
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
Park Chunduck
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
Intel Nervana
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
NVIDIA Taiwan
 

What's hot (20)

Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
Cluster Schedulers
Cluster SchedulersCluster Schedulers
Cluster Schedulers
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshop
 
REST and JAX-RS
REST and JAX-RSREST and JAX-RS
REST and JAX-RS
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NL
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016Advanced Spark and TensorFlow Meetup May 26, 2016
Advanced Spark and TensorFlow Meetup May 26, 2016
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 

Similar to Distributed computing and hyper-parameter tuning with Ray

ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
Anyscale
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
Revolution Analytics
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
Deniz Oguz
 
Real-time Programming in Java
Real-time Programming in JavaReal-time Programming in Java
Real-time Programming in Java
Aleš Plšek
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
Padraig O'Sullivan
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
Ralf Gommers
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
ukdpe
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARN
DataWorks Summit
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Databricks
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
Lior Sidi
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
Patrick Bos
 
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
Elixir Club
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
Igalia
 
Module-related pages
Module-related pagesModule-related pages
Module-related pagesbutest
 

Similar to Distributed computing and hyper-parameter tuning with Ray (20)

ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Introduction to Real Time Java
Introduction to Real Time JavaIntroduction to Real Time Java
Introduction to Real Time Java
 
Real-time Programming in Java
Real-time Programming in JavaReal-time Programming in Java
Real-time Programming in Java
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Java multi thread programming on cmp system
Java multi thread programming on cmp systemJava multi thread programming on cmp system
Java multi thread programming on cmp system
 
Parallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARNParallel Linear Regression in Interative Reduce and YARN
Parallel Linear Regression in Interative Reduce and YARN
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Architecture
ArchitectureArchitecture
Architecture
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
Performance measurement methodology — Maksym Pugach | Elixir Evening Club 3
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
Module-related pages
Module-related pagesModule-related pages
Module-related pages
 

Recently uploaded

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 

Recently uploaded (20)

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 

Distributed computing and hyper-parameter tuning with Ray

  • 1. Distributed computing and hyper-parameter tuning with Ray Jan Margeta | | November 17, 2018 jan@kardio.me @jmargeta
  • 2. Healthier hearts Waste reduction Failure prevention Hi, I am Jan Computer vision and machine learning Pythonista since 2.5+ Founder of KardioMe
  • 3. Martin Fowler's First rule of distributed objects computing Don't Massive complexity booster See also Common fallacies of distributed computing
  • 4. The world is concurrent Towards real-time decisions
  • 5. AND ALSO Resilience cannot be achieved with a single machine Machine learning workflows often need heterogeneous HW and intensive computations Need to scale up and down on demand ImageNet in 224 seconds
  • 6. Concurrency and parallelism in Python Threads, processes, distributed, Dask, Celery, PySpark, async…
  • 7. 3D printed model of your own heart CT or MRI image segmentpreprocess landmark estimation meshing view estimation VR L L P P S M V 3D print M GPU-based machine learning CPU-intensive operation WebVR-based UI - Long runing external process
  • 9. PySpark mature, excellent for ETL, simple queries "BigData" ecosystem in Java better for homogeneous processing of the points R = matrix(rand(M, F)) * matrix(rand(U, F).T) ms = matrix(rand(M, F)) us = matrix(rand(U, F)) Rb = sc.broadcast(R) msb = sc.broadcast(ms) usb = sc.broadcast(us) for i in range(ITERATIONS): ms = sc.parallelize(range(M), partitions) .map(lambda x: update(x, usb.value, Rb.value)) .collect() ms = matrix(np.array(ms)[:, :, 0]) … https://github.com/joost-de-vries/spark-sbt-seed/blob/master/src/main/python/als.py
  • 10. Spark barriers vs dynamic task graphs Ray: A Distributed Execution Framework for Emerging AI Applications Michael Jordan (UC Berkeley)
  • 11. Celery computations defined beforehand mature, support for retries, rate limiting… group, chain, chord, map, starmap, chunks… from celery import Celery app = Celery('jobs', ...) @app.task def compute_stuff(x, y): return x + y @app.task def another_compute_stuff(x, y): return x + y from jobs import compute_stuff, another_compute_stuff compute_stuff.delay(1, 1).get() compute_stuff.apply_async((2, 2), link=another_compute_stuff.s(16)) compute_stuff.starmap([(2, 2), (4, 4)]) http://docs.celeryproject.org/en/master/userguide/canvas.html
  • 12. Dask way more Pythonic than Spark collections that play well with Python ecosystem pickle, cloudpickle, msgpack, and custom numpy global scheduler https://dask.org/ import dask @dask.delayed def add(x, y): return x + y x = add(1, 2) y = add(x, 3) y.compute()
  • 13. Requirements dynamic tasks with stateful computation play well with existing ML tools in Python heterogeneous code and hardware fast with low latency fault tolerant (node failure / addition / removal) scale from multiple cores to multiple nodes
  • 14. Ray Ray is a general purpose framework for doing parallel and distributed Python along with a collection of libraries targeting machine data processing workflows. Developed at UC Berkeley as an attempt to replace Spark https://github.com/ray-project/ray
  • 15. Unique components Clean API Stateless tasks and actors combined Bottom-up scheduling Shared object store with zero copy deserialization
  • 16. Most* of Ray's API you will ever need The rest is (mostly) Python as we know it *Seriously, this is pretty much it ray.init # connect to a Ray cluster ray.remote # declare a task/actor & remote execution ray.get # retrieve a Ray object and convert to a Python object ray.put # manually place an object to the object store ray.wait # retrieve results as they are made ready
  • 17. Tasks Create a task & schedule it throughout the cluster @ray.remote def imread(fname): return cv2.imread(fname) @ray.remote(num_cpus=1, num_gpus=0) def threshold(image, threshold=128): return image > threshold # Immediately returns future future0 = imread.remote('python.png') future1 = threshold.remote(np.ones((224, 224))) futures = [imread.remote(f) for f in glob('*.png')]
  • 18. Actors A solution for mutable state Instantiate the parameter server somewhere on the cluster @ray.remote class ParameterServer(object): def __init__(self, keys, values): values = [value.copy() for value in values] self.weights = dict(zip(keys, values)) def push(self, keys, values): for key, value in zip(keys, values): self.weights[key] += value def pull(self, keys): return [self.weights[key] for key in keys]
  • 19. A single worker @ray.remote def worker(ps): while True: # Get the latest parameters weights = ray.get(ps.pull.remote(keys)) # Compute an update of the params # (e.g. the gradients for neural nets) # Push the updates to the parameter server ps.push.remote(keys, gradients) ps = ParameterServer.remote(keys, initial_values) worker_tasks = [worker.remote(ps) for _ in range(10)]
  • 20. Actors not only for storing machine learning parameters Note that pyhikvision is our custom wrapper to a vendor-specific library in Cython (ray works!) When interfacing with cameras, consider the vendor agnostic and open-source . @ray.remote class Camera: def __init__(self, mac): self.cam = pyhikvision.Camera(mac=mac) self.cam.open() self.num_frames = 0 def grab(self): self.num_frames += 1 return self.cam.grab_frame() def total_frames(self): return self.num_frames cam = Camera.remote(mac='xxxxxx') harverster
  • 21. Actors need no locks for mutation! Actor methods always called one by one future0 = c.grab.remote() future1 = c.total_frames.remote() future2 = c.grab.remote()
  • 22. Get the results This blocks until the future is done All subsequent calls to ray.get return almost instantly Reuse the futures @ray.remote def heavy_computation(): time.sleep(10) return np.zeros((224, 224)) arr = ray.get(future) arr0 = ray.get(future) arr1 = ray.get(future) thumb_future = make_a_thumbnail.remote(future) landmarks_future = find_landmarks.remote(future)
  • 23. Create computational graph Actors and remote functions interoperate seamlessly Benefits of both stateless dataflow and actor frameworks Function can take values, futures, or even actor handles as params frame_id = camera.grab.remote() thresholded_id = threshold.remote(frame_id) thresholded = ray.get(thresholded_id)
  • 24. Define by run JIT import numpy as np @ray.remote def aggregate_data(x, y): return x + y data = [np.random.normal(size=1000) for i in range(4)] while len(data) > 1: intermediate_result = aggregate_data.remote(data[0], data[1]) data = data[2:] + [intermediate_result] result = ray.get(data[0]) https://ray-project.github.io/2017/05/20/announcing-ray.html
  • 25. Architecture P. Moritz, R. Nishihara, et al.: Ray: A Distributed Framework for Emerging AI Applications
  • 26. Worker & driver Receive and execute tasks Submit tasks to other workers Driver is not assigned tasks for execution
  • 27. Plasma - Shared memory object store share objects across local processes in-memory key-value object store data = ['Hello PyConBalkan', 4, (5, 5), np.ones((128, 128))] key = ray.put(data) deserialized = ray.get(key)
  • 28. Apache Arrow serialization Standard objects Numpy arrays See https://ray-project.github.io/2017/10/15/fast-python-serialization-with-ray-and-arrow.html
  • 29. Local scheduler driver can assign a task to a worker bottom up scheduling fractional resources no more tasks in parallel than the number of CPUs (multithreaded libs - restrict the number of threads...)
  • 30. Global control state take all metadata and state out of the system centralize it in a redis cluster everything else is largely stateless now
  • 32. Fault-tolerance Failover to other nodes based on the global control state non actors lineage base - rerun the tasks to reconstruct actors (in the future) recreate actor from the beginning
  • 33. Does it scale? Још видео снимака mujoco video Гледајте касније Дели 0:01 / 0:40 Moritz, Nishihara et al.: Ray: A Distributed Framework for Emerging AI Applications OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
  • 34. On-prem cluster Start head Start nodes with workers Connect and run commands Teardown - stop ray process ray start --head --redis-port=6379 ray start --redis-address=192.168.1.5:6379 # head IP: 192.168.1.5 ray.init(redis_address="192.168.1.5:6379") @ray.remote def imread(filename): return cv2.imread(filename) ims = ray.get([imread.remote(f) for f in glob('*.png')]) ray stop
  • 35. On the cloud Ready-made auto-scaling scripts for AWS and GCP Create a cluster Destroy or write a custom provider ray up ray/python/ray/autoscaler/aws/example-full.yaml ray down ray/python/ray/autoscaler/aws/example-full.yaml https://ray.readthedocs.io/en/latest/using-ray-on-a-large-cluster.html
  • 36. Developing with Ray Testing usually trivial - in → out well defined Debugging webUI breakpoint() or ipdb.set_trace()
  • 37. Higher level libs built on top of Ray Tune rllib modin distributed linear algebra …
  • 39. Function-based API A good idea to extract all traning params anyway def my_tunable_function(config, reporter): train_data, self.test_data = make_data_loaders(config) model = make_model(config) trainer = make_optimizer(model, config) for epoch in range(10): # Could be an infinite loop too train(model, trainer, train_data) accuracy = evaluate(model, test_data) reporter(mean_accuracy=accuracy)
  • 40. Class-based API class MyTunableClass(Trainable): def _setup(self, config): self.train_data, self.test_data = make_data_loaders(config) self.model = make_model(config) self.trainer = make_optimizer(model, config) def _train(self): train_for_a_while(self.model, self.train_data, self.trainer) return {"mean_accuracy": eval_model(self.model, self.test_data)} def _save(self, checkpoint_dir): return save_model(self.model, checkpoint_dir) def _restore(self, checkpoint_path): self.model.load_state_dict(checkpoint_path)
  • 41. Experiment config experiment_spec = Experiment( "experiment_name", my_tunable_function_or_class, stop={"mean_accuracy": 98.5}, config={ "learning_rate": tune.grid_search([0.001, 0.01, 0.1]), "regularization": lambda x: 10 * np.random.rand(1), }, trial_resources={ "cpu": 1, "gpu": 0 }, num_samples=10 ) run_experiments(experiments=experiment_spec)
  • 42. Compare the models with Tensorboard
  • 44. Wrapping OpenAI gym environments in actors import gym @ray.remote class Simulator: def __init__(self): self.env = gym.make("SpaceInvaders-v0") self.env.reset() def step(self, action): return self.env.step(action) simulator = Simulator.remote() # Take actions in the simulator observations = [] observations.append(simulator.step.remote(0)) observations.append(simulator.step.remote(1))
  • 45. Remote arrays and distributed linear algebra import ray from ray.experimental.array.distributed import linalg, random ray.init() arr = random.normal.remote((200, 200)) decomposed = linalg.qr.remote(arr) orthogonal_da, triangular_da = ray.get(decomposed) orthogonal, triangular = orthogonal_da.assemble(), triangular_da.assemble
  • 46. Speed-up your Pandas pipelines import modin.pandas as pd https://github.com/modin-project/modin
  • 47. Conclusion A little teaser of Ray Build and scale your ML and other tools Systems that adapt, learn online Even locally as an alternative to threads and processes Check out Ray's fantatic tutorials pip install ray
  • 48. Thanks! Distributed computing and hyper-parameter tuning with Ray Jan Margeta | | November 17, 2018 jan@kardio.me @jmargeta
  • 49. Read more Butcher - Seven concurrency models in seven weeks A note on distributed computing - Waldo J. et al. Herb sutter - Free lunch is over Fallacies of distrib. computing explained - Rotem-Gal- Oz Fallacies of distrib. computing - P. Deutsch Ray docs Ray tutorial Plasma store Plasma store and Arrow Scaling Python modules witih ray framework
  • 50. Read more Ray - a cluster computing engine for reinforcement learning applictions https://ray-project.github.io/2018/07/15/parameter- server-in-fifteen-lines.html Robert Nishihara - Ray: A Distributed Execution Framework for AI | SciPy 2018 M. Rocklin - Dask and Celery Dask comparison to Spark Ray: A Distributed System for AI Resources