SlideShare a Scribd company logo
1 of 37
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray AI Runtime (AIR) on AWS:
Distributed ML with Amazon SageMaker,
EC2, EMR, and EKS!
GitHub repo:
https://github.com/data-science-on-aws
Recordings:
https://youtube.datascienceonaws.com
Book:
https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Speakers
Chris Fregly
Principal Solution Architect, AI/ML
@ AWS
Antje Barth
Principal Developer Advocate, AI/ML
@ AWS
2
Apoorva Kulkarni
Senior Solution Architect, Containers
@ AWS
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
What is Ray?
3
Friction-less transition from research to production
Encourages iterative development and debugging
Env management: “conda as a service”, auto-syncs files across cluster
Makes TensorFlow/PyTorch/Scikit/Everything as easy to scale as Spark
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray ecosystem
4
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Scale from laptop to cluster
5
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Scale from laptop to cluster
6
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
ray up cluster.yaml
7
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Clusters
8
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
ray cluster-dump
9
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/10
Ray Quick Start on AWS
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Autoscale
11
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Dashboard
12
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Frictionless transition from research to production
13
Local
development
Remote cluster
production job
Remote cluster
development
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Local development: local laptop and conda
14
pytorch-huggingface-clothing.py  # train.py
--num_train_epochs 1  # hyper-parameter
--max_length 64  # hyper-parameter
--num_workers 4  # number of workers (ie. CPUs or GPUs)
--model_name_or_path roberta-base  # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv 
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Remote development: cluster and cluster-scope conda
15
ray submit cluster.yaml  # run the same python script on Ray cluster!
pytorch-huggingface-clothing.py  # train.py
--num_train_epochs 10  # hyper-parameter
--max_length 64  # hyper-parameter
--num_workers 64  # number of workers
--model_name_or_path roberta-base  # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv 
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Remote cluster production jobs: specify conda yaml per job
16
ray job submit 
--working-dir .  # Copy everything from this directory and below
--runtime-env job-pytorch-huggingface-clothing-runtime.yaml  # Conda env yaml
--address http://127.0.0.1:8265 --  # port forward to cluster
python pytorch-huggingface-clothing.py  # train.py
--num_train_epochs 1  # hyper-parameter
--max_length 64  # hyper-parameter
--num_workers 4  # number of workers (ie. CPUs or GPUs)
--model_name_or_path roberta-base  # base BERT model
--train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv 
--validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray environment management (“conda as a service”)
17
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray debugging
18
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/debug
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray AIR (AI Runtime)
19
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray AIR (AI Runtime) - Quickstart
20
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Data - not (yet) a DataFrame abstraction (ie. no joins)
21
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Modin: Pandas on Ray
22
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
RayDP: Spark on Ray
23
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Tune
24
from ray import tune
# 1. Define an objective function.
def objective(config):
score = config["a"] ** 2 + config["b"]
return {"score": score}
# 2. Define a search space.
search_space = {
"a": tune.grid_search([0.001, 0.01, 0.1, 1.0]),
"b": tune.choice([1, 2, 3]),
}
# 3. Start a Tune run and print the best result.
analysis = tune.run(objective, config=search_space)
print(analysis.get_best_config(metric="score", mode="min"))
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray RLlib: Initial beachhead for Ray
25
Ray Reinforcement Learning Ray Data & Ray Train/Tune
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve
26
Serving Framework on Ray
Python-native, supports any Python code, ML framework, etc
Compose multiple ML models into a deployment graph
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: NLP inference pipeline with HuggingFace
27
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: combine 2 NLP models, average the predictions
28
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: DAGDriver (http server)
29
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: build DAG with http inputs
30
InputNode() - http
inputs to the DAG
bind() - Graph
building API on
decorated body
serve.run() - Run
deployment graph
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: submit long-running “serve job” to cluster
31
ray job submit 
--working-dir . 
--runtime-env job-serve-runtime.yaml 
-- python serve-dag-huggingface.py
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Serve: run sample predictions with an http client
32
import requests
input_text_list = ["Ray Serve is great!", "Serving frameworks without DAG
support are not great."]
for input_text in input_text_list:
prediction = requests.get("http://<cluster_host>:8080/invocations", 
data=input_text).text
print("Prediction for '{}' is {}".format(input_text, prediction))
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Workflows
33
High-performance, durable application
workflows
Large-scale workflows
(ie. ML and data pipelines)
Long-running business workflows
(when used with Ray Serve)
read_data() preprocessing() train() validate()
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Workflows - Define steps
34
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/workflow
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray Workflows - Initialize storage, setup and run workflow
35
Workflow.run() -
Start workflow DAG
Setup workflow DAG
Workflow execution is
durably logged to storage
https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/workflow
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Ray + Kubernetes
36
KubeRay
https://shopify.engineering/merlin-shopify-machine-learning-platform
Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/
Demos: Ray on AWS
37
https://github.com/data-science-on-aws/data-science-on-aws/tree/2a968ba/wip/ray

More Related Content

What's hot

AWS Black Belt Techシリーズ AWS Data Pipeline
AWS Black Belt Techシリーズ  AWS Data PipelineAWS Black Belt Techシリーズ  AWS Data Pipeline
AWS Black Belt Techシリーズ AWS Data PipelineAmazon Web Services Japan
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
Oracle Cloud Infrastructure:2022年4月度サービス・アップデート
Oracle Cloud Infrastructure:2022年4月度サービス・アップデートOracle Cloud Infrastructure:2022年4月度サービス・アップデート
Oracle Cloud Infrastructure:2022年4月度サービス・アップデートオラクルエンジニア通信
 
Amazon S3を中心とするデータ分析のベストプラクティス
Amazon S3を中心とするデータ分析のベストプラクティスAmazon S3を中心とするデータ分析のベストプラクティス
Amazon S3を中心とするデータ分析のベストプラクティスAmazon Web Services Japan
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Amazon Web Services
 
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019株式会社クライム
 
The Online Tech of Titanfall
The Online Tech of TitanfallThe Online Tech of Titanfall
The Online Tech of Titanfallvtslothy
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報Emma Haruka Iwao
 
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)オラクルエンジニア通信
 
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for LustreAmazon Web Services Japan
 
Jetpack datastore入門
Jetpack datastore入門Jetpack datastore入門
Jetpack datastore入門furusin
 
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017Network Performance: Making Every Packet Count - NET401 - re:Invent 2017
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017Amazon Web Services
 
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/Spring
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/SpringPacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/Spring
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/SpringTakatoshi Matsuo
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONMarkus Michalewicz
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)NTT DATA Technology & Innovation
 

What's hot (20)

AWS Black Belt Techシリーズ AWS Data Pipeline
AWS Black Belt Techシリーズ  AWS Data PipelineAWS Black Belt Techシリーズ  AWS Data Pipeline
AWS Black Belt Techシリーズ AWS Data Pipeline
 
Using SQL on OEM Data
Using SQL on OEM DataUsing SQL on OEM Data
Using SQL on OEM Data
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Spring Cloud Data Flow の紹介 #streamctjp
Spring Cloud Data Flow の紹介  #streamctjpSpring Cloud Data Flow の紹介  #streamctjp
Spring Cloud Data Flow の紹介 #streamctjp
 
Oracle Cloud Infrastructure:2022年4月度サービス・アップデート
Oracle Cloud Infrastructure:2022年4月度サービス・アップデートOracle Cloud Infrastructure:2022年4月度サービス・アップデート
Oracle Cloud Infrastructure:2022年4月度サービス・アップデート
 
Amazon S3を中心とするデータ分析のベストプラクティス
Amazon S3を中心とするデータ分析のベストプラクティスAmazon S3を中心とするデータ分析のベストプラクティス
Amazon S3を中心とするデータ分析のベストプラクティス
 
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
Deep Dive on the Amazon Aurora PostgreSQL-compatible Edition - DAT402 - re:In...
 
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
隠れたデータベースの遅延原因を特定し、そのレスポンスの改善手法紹介 @ dbtech showcase Tokyo 2019
 
The Online Tech of Titanfall
The Online Tech of TitanfallThe Online Tech of Titanfall
The Online Tech of Titanfall
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報
 
NVIDIA 入門
NVIDIA 入門NVIDIA 入門
NVIDIA 入門
 
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)
オンプレミスからクラウドへ:Oracle Databaseの移行ベストプラクティスを解説 (Oracle Cloudウェビナーシリーズ: 2021年2月18日)
 
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
20190319 AWS Black Belt Online Seminar Amazon FSx for Lustre
 
Jetpack datastore入門
Jetpack datastore入門Jetpack datastore入門
Jetpack datastore入門
 
KafkaとPulsar
KafkaとPulsarKafkaとPulsar
KafkaとPulsar
 
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017Network Performance: Making Every Packet Count - NET401 - re:Invent 2017
Network Performance: Making Every Packet Count - NET401 - re:Invent 2017
 
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/Spring
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/SpringPacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/Spring
Pacemaker+PostgreSQLレプリケーションで共有ディスクレス高信頼クラスタの構築@OSC 2013 Tokyo/Spring
 
Rac 12c optimization
Rac 12c optimizationRac 12c optimization
Rac 12c optimization
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLONOracle RAC 19c and Later - Best Practices #OOWLON
Oracle RAC 19c and Later - Best Practices #OOWLON
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
 

More from Chris Fregly

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataChris Fregly
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfChris Fregly
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedChris Fregly
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine LearningChris Fregly
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...Chris Fregly
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon BraketChris Fregly
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapChris Fregly
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Chris Fregly
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Chris Fregly
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...Chris Fregly
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...Chris Fregly
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Chris Fregly
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...Chris Fregly
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...Chris Fregly
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...Chris Fregly
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...Chris Fregly
 

More from Chris Fregly (20)

AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
 
Pandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdfPandas on AWS - Let me count the ways.pdf
Pandas on AWS - Let me count the ways.pdf
 
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds UpdatedSmokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
Smokey and the Multi-Armed Bandit Featuring BERT Reynolds Updated
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap:  AI and Machine LearningAmazon reInvent 2020 Recap:  AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
 
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...Waking the Data Scientist at 2am:  Detect Model Degradation on Production Mod...
Waking the Data Scientist at 2am: Detect Model Degradation on Production Mod...
 
Quantum Computing with Amazon Braket
Quantum Computing with Amazon BraketQuantum Computing with Amazon Braket
Quantum Computing with Amazon Braket
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
AWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:CapAWS Re:Invent 2019 Re:Cap
AWS Re:Invent 2019 Re:Cap
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
Spark SQL Catalyst Optimizer, Custom Expressions, UDFs - Advanced Spark and T...
 
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
 
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
 
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
 
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
 

Ray AI Runtime (AIR) on AWS - Data Science On AWS Meetup

  • 1. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray AI Runtime (AIR) on AWS: Distributed ML with Amazon SageMaker, EC2, EMR, and EKS! GitHub repo: https://github.com/data-science-on-aws Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/
  • 2. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Speakers Chris Fregly Principal Solution Architect, AI/ML @ AWS Antje Barth Principal Developer Advocate, AI/ML @ AWS 2 Apoorva Kulkarni Senior Solution Architect, Containers @ AWS
  • 3. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ What is Ray? 3 Friction-less transition from research to production Encourages iterative development and debugging Env management: “conda as a service”, auto-syncs files across cluster Makes TensorFlow/PyTorch/Scikit/Everything as easy to scale as Spark
  • 4. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray ecosystem 4
  • 5. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Scale from laptop to cluster 5
  • 6. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Scale from laptop to cluster 6
  • 7. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ ray up cluster.yaml 7
  • 8. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Clusters 8
  • 9. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ ray cluster-dump 9
  • 10. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/10 Ray Quick Start on AWS
  • 11. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Autoscale 11
  • 12. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Dashboard 12
  • 13. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Frictionless transition from research to production 13 Local development Remote cluster production job Remote cluster development
  • 14. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Local development: local laptop and conda 14 pytorch-huggingface-clothing.py # train.py --num_train_epochs 1 # hyper-parameter --max_length 64 # hyper-parameter --num_workers 4 # number of workers (ie. CPUs or GPUs) --model_name_or_path roberta-base # base BERT model --train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv --validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
  • 15. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Remote development: cluster and cluster-scope conda 15 ray submit cluster.yaml # run the same python script on Ray cluster! pytorch-huggingface-clothing.py # train.py --num_train_epochs 10 # hyper-parameter --max_length 64 # hyper-parameter --num_workers 64 # number of workers --model_name_or_path roberta-base # base BERT model --train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv --validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
  • 16. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Remote cluster production jobs: specify conda yaml per job 16 ray job submit --working-dir . # Copy everything from this directory and below --runtime-env job-pytorch-huggingface-clothing-runtime.yaml # Conda env yaml --address http://127.0.0.1:8265 -- # port forward to cluster python pytorch-huggingface-clothing.py # train.py --num_train_epochs 1 # hyper-parameter --max_length 64 # hyper-parameter --num_workers 4 # number of workers (ie. CPUs or GPUs) --model_name_or_path roberta-base # base BERT model --train_file ./data/train/part-algo-1-womens_clothing_ecommerce_reviews.csv --validation_file ./data/validation/part-algo-1-womens_clothing_ecommerce_reviews.csv https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/train
  • 17. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray environment management (“conda as a service”) 17
  • 18. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray debugging 18 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/debug
  • 19. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray AIR (AI Runtime) 19
  • 20. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray AIR (AI Runtime) - Quickstart 20
  • 21. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Data - not (yet) a DataFrame abstraction (ie. no joins) 21 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
  • 22. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Modin: Pandas on Ray 22 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
  • 23. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ RayDP: Spark on Ray 23 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/datasets
  • 24. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Tune 24 from ray import tune # 1. Define an objective function. def objective(config): score = config["a"] ** 2 + config["b"] return {"score": score} # 2. Define a search space. search_space = { "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]), "b": tune.choice([1, 2, 3]), } # 3. Start a Tune run and print the best result. analysis = tune.run(objective, config=search_space) print(analysis.get_best_config(metric="score", mode="min"))
  • 25. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray RLlib: Initial beachhead for Ray 25 Ray Reinforcement Learning Ray Data & Ray Train/Tune
  • 26. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve 26 Serving Framework on Ray Python-native, supports any Python code, ML framework, etc Compose multiple ML models into a deployment graph
  • 27. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: NLP inference pipeline with HuggingFace 27 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
  • 28. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: combine 2 NLP models, average the predictions 28 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
  • 29. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: DAGDriver (http server) 29 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
  • 30. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: build DAG with http inputs 30 InputNode() - http inputs to the DAG bind() - Graph building API on decorated body serve.run() - Run deployment graph https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/serve
  • 31. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: submit long-running “serve job” to cluster 31 ray job submit --working-dir . --runtime-env job-serve-runtime.yaml -- python serve-dag-huggingface.py
  • 32. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Serve: run sample predictions with an http client 32 import requests input_text_list = ["Ray Serve is great!", "Serving frameworks without DAG support are not great."] for input_text in input_text_list: prediction = requests.get("http://<cluster_host>:8080/invocations", data=input_text).text print("Prediction for '{}' is {}".format(input_text, prediction))
  • 33. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Workflows 33 High-performance, durable application workflows Large-scale workflows (ie. ML and data pipelines) Long-running business workflows (when used with Ray Serve) read_data() preprocessing() train() validate()
  • 34. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Workflows - Define steps 34 https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/workflow
  • 35. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray Workflows - Initialize storage, setup and run workflow 35 Workflow.run() - Start workflow DAG Setup workflow DAG Workflow execution is durably logged to storage https://github.com/data-science-on-aws/data-science-on-aws/tree/5b5ed1a/wip/ray/workflow
  • 36. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Ray + Kubernetes 36 KubeRay https://shopify.engineering/merlin-shopify-machine-learning-platform
  • 37. Recordings: https://youtube.datascienceonaws.com Book: https://www.amazon.com/Data-Science-AWS-End-End/dp/1492079391/ GitHub: https://github.com/data-science-on-aws/ Demos: Ray on AWS 37 https://github.com/data-science-on-aws/data-science-on-aws/tree/2a968ba/wip/ray