SlideShare a Scribd company logo
Relay: A New IR for Machine Learning Frameworks
Jared Roesch Steven Lyubomirsky Logan Weber Josh Pollock
jroesch@cs.uw.edu sslyu@cs.uw.edu weberlo@cs.uw.edu joshpoll@cs.uw.edu
Marisa Kirisame Tianqi Chen Zachary Tatlock
jerry96@cs.uw.edu tqchen@cs.uw.edu ztatlock@cs.uw.edu
Paul G. Allen School of Computer Science and Engineering
University of Washington, Seattle, WA, USA
Abstract
Machine learning powers diverse services in industry in-
cluding search, translation, recommendation systems, and
security. The scale and importance of these models require
that they be ecient, expressive, and portable across an ar-
ray of heterogeneous hardware devices. These constraints
1 Introduction
Machine learning (ML) has dramatically reshaped computer
vision [17, 21], natural language processing, robotics [14],
and computational biology and is continuing to gain trac-
tion in new areas, including program synthesis [6]. Deep
learning (DL) in particular has driven progress in these areas
2 . 1
# 0 83
T FEFJ B
• 8N 9
• m efc
• pl iSr~ ]
• 2 G AE 8N ( 9
• . / wSrbnS o [ ” i ud
• S lg ksc 8 ( N9 00 8 ( N9
• ” i ud
•
• , 1 BIFE am efc “ TP F JL IJ JAFEP P FM P
• A lp T 7MG I E F DDAE 2 E IP F P7 2P
•
• 2 y tS 8 )N9
U
•
• G
D N
+C P + NG
• B K TC O M aSBD
B e T LGN
•
• C
• C e
• C LGN DX
(MAPL’18, June 18, 2018, Philadelphia, PA, USA
Frameworks
Computational Graph
High level Data-flow Rewriting
Tensor Operator Description
Schedule
LLVMAccelerators CUDA/Metal/OpenCL
CNTK
CoreML These graphs are easy to optimize
struct programs in a deeply-embe
guage (eDSL) without high-level a
A more expressive style popular
works like Chainer, PyTorch, and G
tion of graphs with dynamic topol
runtime data and support diere
tive computations. This expressiv
user but has limited the ability fo
optimize user-dened graphs. Mo
requires a Python interpreter, ma
accelerators and FPGAs extremely
In summary, static graphs are e
the expressivity found in higher-l
M I
I M H M N
I
• R
• MN
•
• V J
• I
•
programs’ computational expressivity. Frameworks like Ten-
sorFlow represent dierentiable computation using static
graphs, which are dataow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
t
a
2
C
a
c
f
w
h
w
programs’ computational expressivity. Frameworks like Ten-
sorFlow represent dierentiable computation using static
graphs, which are dataow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
t
a
2
C
a
c
f
w
h
w
S
RT
IMT
D
L
1 A1 4
1 1 1 3 4
.D 4 4
4 3
A 1 2 554 4 1
D 4 .D 4
A A 4
A 4 .D 4
E1
. 5 1 4 44
A 4 21 22A 12D
D 4 .D 4 4
LGF
A 7 C2 C
• 1 C
• +.
• 1 2C A A
F A 1 -
• 6 6 . 1
• . A : :
• / C 6
. 1
• - 1 1AA
• 1
• / + 6
• -1 1 TV I MT
• T
RIP
:CC C: 6 C 7
• A7 ,
• 1
• CA + F
• CA3 C
• hw e
• C 7
• 7 3 a i pf
w lfs P
a d
• C 7 hw ,
• 1 - 67
• T R wfn b O
c 7 3 e
v / C:
• Iit m
• ,. uon O O
M gry l O 7 7 C
e Vd bc
• i b
• , , , . . . ., E r ols
y r ols , . : N
• w“ C ogD LC D
• r om r omw“ P w“ f
• u t
• r an
d escp
• C D f
• C D
• /
• 2
•
) ) 1
@relay_model
def lenet(x: Tensor[Float, (1, 28, 28)]) - Tensor[Float, 10]:
conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1],
no_bias=False) tanh1 = relay.tanh(conv1)
pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1])
conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1],
no_bias=False) tanh2 = relay.tanh(conv2)
pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2)
fc1 = relay.linear(flatten, num_hidden=500)
tanh3 = relay.tanh(fc1)
return relay.linear(tanh3, num_hidden=10)
) ) 2
@relay
def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) - Float:
return relay.softmax_cross_entropy(lenet(x), y)
@relay
def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) - Model:
model = relay.create_model(lenet)
for x, y in data:
model_grad = relay.grad(model, loss, (x, y))
relay.update_model_params(model, model_grad)
return relay.export_model(model)
training_data, test_data = relay.datasets.mnist()
model = train_lenet(training_data)
print(relay.argmax(model(test_data[0])))
•
•  + : af
• t k k
• m t y nR + ,
• d S pD
• ) : + , u f c e
s
• n b
• ( - - :
• W g l i h n “ or
u gas
• -. -d oauSN V
• rw avV
• 21T Dr S M Tfgic
• V Vua gasSJ P L JP T
• nih 3 2 3 2 3 1
, 3 2 2 22
• f ir ni dDm
• y VnDet M I fl
N n
• N bN IoH PR n t d
N kyN e aE 5 4 4 . -
• c a s 5 T gi
N 4 N e a
p fN
• 4 5 4 7 4 Ip e a
• ,K
• r N hNi T aS KIRP L f
• i T a N lN O
• M
• 2 871 0
•
7 /17 + + C 2
• 1
• / 0 /
【論文紹介】Relay: A New IR for Machine Learning Frameworks

More Related Content

What's hot

TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPU
Te-Yen Liu
 
TensorFlow Study Part I
TensorFlow Study Part ITensorFlow Study Part I
TensorFlow Study Part I
Te-Yen Liu
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
Nicolò Savioli
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017
Te-Yen Liu
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
명신 김
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
Hidekazu Oiwa
 
Modified "Why MacRuby Matters"
Modified "Why MacRuby Matters"Modified "Why MacRuby Matters"
Modified "Why MacRuby Matters"
Sean McCune
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
Daniel Lemire
 
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
MITSUNARI Shigeo
 
WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装
MITSUNARI Shigeo
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
Rakuten Group, Inc.
 
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモDNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
Shinya Takamaeda-Y
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Rakuten Group, Inc.
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
Daniel Lemire
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
Rakuten Group, Inc.
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernel
Davidlohr Bueso
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
Alex Pruden
 
snarks <3 hash functions
snarks <3 hash functionssnarks <3 hash functions
snarks <3 hash functions
Rebekah Mercer
 

What's hot (20)

TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPU
 
TensorFlow Study Part I
TensorFlow Study Part ITensorFlow Study Part I
TensorFlow Study Part I
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Caffe studying 2017
Caffe studying 2017Caffe studying 2017
Caffe studying 2017
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
 
Modified "Why MacRuby Matters"
Modified "Why MacRuby Matters"Modified "Why MacRuby Matters"
Modified "Why MacRuby Matters"
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
 
WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
 
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモDNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernel
 
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
 
snarks <3 hash functions
snarks <3 hash functionssnarks <3 hash functions
snarks <3 hash functions
 

Similar to 【論文紹介】Relay: A New IR for Machine Learning Frameworks

Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Aijun Zhang
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
Kazuki Fujikawa
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
IJPEDS-IAES
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
illidan2004
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
Arshit Rai
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
Jo-fai Chow
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
inside-BigData.com
 
Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
BUCHUPALLIVIMALAREDD2
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
Arshit Rai
 
Matlab workshop
Matlab workshopMatlab workshop
Matlab workshop
محمدعبد الحى
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
Databricks
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
VasileiosMezaris
 
spparksUpdates
spparksUpdatesspparksUpdates
spparksUpdates
Justin Roberts
 

Similar to 【論文紹介】Relay: A New IR for Machine Learning Frameworks (20)

Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
Co-Simulation Interfacing Capabilities in Device-Level Power Electronic Circu...
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on MulticoreMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
 
Summer training matlab
Summer training matlab Summer training matlab
Summer training matlab
 
Matlab workshop
Matlab workshopMatlab workshop
Matlab workshop
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
Icsm07.ppt
Icsm07.pptIcsm07.ppt
Icsm07.ppt
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
spparksUpdates
spparksUpdatesspparksUpdates
spparksUpdates
 

Recently uploaded

CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
Mahmoud Morsy
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 

Recently uploaded (20)

CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Certificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi AhmedCertificates - Mahmoud Mohamed Moursi Ahmed
Certificates - Mahmoud Mohamed Moursi Ahmed
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 

【論文紹介】Relay: A New IR for Machine Learning Frameworks

  • 1. Relay: A New IR for Machine Learning Frameworks Jared Roesch Steven Lyubomirsky Logan Weber Josh Pollock jroesch@cs.uw.edu sslyu@cs.uw.edu weberlo@cs.uw.edu joshpoll@cs.uw.edu Marisa Kirisame Tianqi Chen Zachary Tatlock jerry96@cs.uw.edu tqchen@cs.uw.edu ztatlock@cs.uw.edu Paul G. Allen School of Computer Science and Engineering University of Washington, Seattle, WA, USA Abstract Machine learning powers diverse services in industry in- cluding search, translation, recommendation systems, and security. The scale and importance of these models require that they be ecient, expressive, and portable across an ar- ray of heterogeneous hardware devices. These constraints 1 Introduction Machine learning (ML) has dramatically reshaped computer vision [17, 21], natural language processing, robotics [14], and computational biology and is continuing to gain trac- tion in new areas, including program synthesis [6]. Deep learning (DL) in particular has driven progress in these areas 2 . 1 # 0 83
  • 2.
  • 3. T FEFJ B • 8N 9 • m efc • pl iSr~ ] • 2 G AE 8N ( 9 • . / wSrbnS o [ ” i ud • S lg ksc 8 ( N9 00 8 ( N9 • ” i ud • • , 1 BIFE am efc “ TP F JL IJ JAFEP P FM P • A lp T 7MG I E F DDAE 2 E IP F P7 2P • • 2 y tS 8 )N9
  • 5. • B K TC O M aSBD B e T LGN • • C • C e • C LGN DX
  • 6. (MAPL’18, June 18, 2018, Philadelphia, PA, USA Frameworks Computational Graph High level Data-flow Rewriting Tensor Operator Description Schedule LLVMAccelerators CUDA/Metal/OpenCL CNTK CoreML These graphs are easy to optimize struct programs in a deeply-embe guage (eDSL) without high-level a A more expressive style popular works like Chainer, PyTorch, and G tion of graphs with dynamic topol runtime data and support diere tive computations. This expressiv user but has limited the ability fo optimize user-dened graphs. Mo requires a Python interpreter, ma accelerators and FPGAs extremely In summary, static graphs are e the expressivity found in higher-l M I I M H M N I
  • 7. • R • MN • • V J • I •
  • 8. programs’ computational expressivity. Frameworks like Ten- sorFlow represent dierentiable computation using static graphs, which are dataow graphs with a xed topology. Relay Fusion, Layout Change, Partial Eval, Traditional Optimizations Tensor Operator Description Schedule Hardware Implementation Frameworks CNTK CoreML Relay Python Decorator Operators Relay runtime system Control Figure 2. The new TVM stack integrated with Relay. t a 2 C a c f w h w
  • 9. programs’ computational expressivity. Frameworks like Ten- sorFlow represent dierentiable computation using static graphs, which are dataow graphs with a xed topology. Relay Fusion, Layout Change, Partial Eval, Traditional Optimizations Tensor Operator Description Schedule Hardware Implementation Frameworks CNTK CoreML Relay Python Decorator Operators Relay runtime system Control Figure 2. The new TVM stack integrated with Relay. t a 2 C a c f w h w S RT IMT D L
  • 10. 1 A1 4 1 1 1 3 4 .D 4 4 4 3 A 1 2 554 4 1 D 4 .D 4 A A 4 A 4 .D 4 E1 . 5 1 4 44 A 4 21 22A 12D D 4 .D 4 4 LGF
  • 11. A 7 C2 C • 1 C • +. • 1 2C A A F A 1 - • 6 6 . 1 • . A : : • / C 6 . 1 • - 1 1AA • 1 • / + 6 • -1 1 TV I MT • T RIP
  • 12. :CC C: 6 C 7 • A7 , • 1 • CA + F • CA3 C • hw e • C 7 • 7 3 a i pf w lfs P a d • C 7 hw , • 1 - 67 • T R wfn b O c 7 3 e v / C: • Iit m • ,. uon O O M gry l O 7 7 C e Vd bc
  • 13. • i b • , , , . . . ., E r ols y r ols , . : N • w“ C ogD LC D • r om r omw“ P w“ f • u t • r an d escp • C D f • C D
  • 15. ) ) 1 @relay_model def lenet(x: Tensor[Float, (1, 28, 28)]) - Tensor[Float, 10]: conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1], no_bias=False) tanh1 = relay.tanh(conv1) pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1], no_bias=False) tanh2 = relay.tanh(conv2) pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2) fc1 = relay.linear(flatten, num_hidden=500) tanh3 = relay.tanh(fc1) return relay.linear(tanh3, num_hidden=10)
  • 16. ) ) 2 @relay def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) - Float: return relay.softmax_cross_entropy(lenet(x), y) @relay def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) - Model: model = relay.create_model(lenet) for x, y in data: model_grad = relay.grad(model, loss, (x, y)) relay.update_model_params(model, model_grad) return relay.export_model(model) training_data, test_data = relay.datasets.mnist() model = train_lenet(training_data) print(relay.argmax(model(test_data[0])))
  • 17. • • + : af • t k k • m t y nR + , • d S pD • ) : + , u f c e s • n b • ( - - : • W g l i h n “ or
  • 18. u gas • -. -d oauSN V • rw avV • 21T Dr S M Tfgic • V Vua gasSJ P L JP T • nih 3 2 3 2 3 1 , 3 2 2 22 • f ir ni dDm • y VnDet M I fl
  • 19. N n • N bN IoH PR n t d N kyN e aE 5 4 4 . - • c a s 5 T gi N 4 N e a p fN • 4 5 4 7 4 Ip e a • ,K • r N hNi T aS KIRP L f • i T a N lN O
  • 20. • M • 2 871 0 • 7 /17 + + C 2
  • 21. • 1 • / 0 /