Relay: A New IR for Machine Learning Frameworks
Jared Roesch Steven Lyubomirsky Logan Weber Josh Pollock
jroesch@cs.uw.edu sslyu@cs.uw.edu weberlo@cs.uw.edu joshpoll@cs.uw.edu
Marisa Kirisame Tianqi Chen Zachary Tatlock
jerry96@cs.uw.edu tqchen@cs.uw.edu ztatlock@cs.uw.edu
Paul G. Allen School of Computer Science and Engineering
University of Washington, Seattle, WA, USA
Abstract
Machine learning powers diverse services in industry in-
cluding search, translation, recommendation systems, and
security. The scale and importance of these models require
that they be ecient, expressive, and portable across an ar-
ray of heterogeneous hardware devices. These constraints
1 Introduction
Machine learning (ML) has dramatically reshaped computer
vision [17, 21], natural language processing, robotics [14],
and computational biology and is continuing to gain trac-
tion in new areas, including program synthesis [6]. Deep
learning (DL) in particular has driven progress in these areas
2 . 1
# 0 83
T FEFJ B
• 8N 9
• m efc
• pl iSr~ ]
• 2 G AE 8N ( 9
• . / wSrbnS o [ ” i ud
• S lg ksc 8 ( N9 00 8 ( N9
• ” i ud
•
• , 1 BIFE am efc “ TP F JL IJ JAFEP P FM P
• A lp T 7MG I E F DDAE 2 E IP F P7 2P
•
• 2 y tS 8 )N9
U
•
• G
D N
+C P + NG
• B K TC O M aSBD
B e T LGN
•
• C
• C e
• C LGN DX
(MAPL’18, June 18, 2018, Philadelphia, PA, USA
Frameworks
Computational Graph
High level Data-flow Rewriting
Tensor Operator Description
Schedule
LLVMAccelerators CUDA/Metal/OpenCL
CNTK
CoreML These graphs are easy to optimize
struct programs in a deeply-embe
guage (eDSL) without high-level a
A more expressive style popular
works like Chainer, PyTorch, and G
tion of graphs with dynamic topol
runtime data and support diere
tive computations. This expressiv
user but has limited the ability fo
optimize user-dened graphs. Mo
requires a Python interpreter, ma
accelerators and FPGAs extremely
In summary, static graphs are e
the expressivity found in higher-l
M I
I M H M N
I
• R
• MN
•
• V J
• I
•
programs’ computational expressivity. Frameworks like Ten-
sorFlow represent dierentiable computation using static
graphs, which are dataow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
t
a
2
C
a
c
f
w
h
w
programs’ computational expressivity. Frameworks like Ten-
sorFlow represent dierentiable computation using static
graphs, which are dataow graphs with a xed topology.
Relay
Fusion, Layout Change, Partial Eval,
Traditional Optimizations
Tensor Operator Description
Schedule
Hardware Implementation
Frameworks
CNTK
CoreML
Relay Python Decorator
Operators
Relay runtime
system
Control
Figure 2. The new TVM stack integrated with Relay.
t
a
2
C
a
c
f
w
h
w
S
RT
IMT
D
L
1 A1 4
1 1 1 3 4
.D 4 4
4 3
A 1 2 554 4 1
D 4 .D 4
A A 4
A 4 .D 4
E1
. 5 1 4 44
A 4 21 22A 12D
D 4 .D 4 4
LGF
A 7 C2 C
• 1 C
• +.
• 1 2C A A
F A 1 -
• 6 6 . 1
• . A : :
• / C 6
. 1
• - 1 1AA
• 1
• / + 6
• -1 1 TV I MT
• T
RIP
:CC C: 6 C 7
• A7 ,
• 1
• CA + F
• CA3 C
• hw e
• C 7
• 7 3 a i pf
w lfs P
a d
• C 7 hw ,
• 1 - 67
• T R wfn b O
c 7 3 e
v / C:
• Iit m
• ,. uon O O
M gry l O 7 7 C
e Vd bc
• i b
• , , , . . . ., E r ols
y r ols , . : N
• w“ C ogD LC D
• r om r omw“ P w“ f
• u t
• r an
d escp
• C D f
• C D
• /
• 2
•
) ) 1
@relay_model
def lenet(x: Tensor[Float, (1, 28, 28)]) - Tensor[Float, 10]:
conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1],
no_bias=False) tanh1 = relay.tanh(conv1)
pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1])
conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1],
no_bias=False) tanh2 = relay.tanh(conv2)
pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2)
fc1 = relay.linear(flatten, num_hidden=500)
tanh3 = relay.tanh(fc1)
return relay.linear(tanh3, num_hidden=10)
) ) 2
@relay
def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) - Float:
return relay.softmax_cross_entropy(lenet(x), y)
@relay
def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) - Model:
model = relay.create_model(lenet)
for x, y in data:
model_grad = relay.grad(model, loss, (x, y))
relay.update_model_params(model, model_grad)
return relay.export_model(model)
training_data, test_data = relay.datasets.mnist()
model = train_lenet(training_data)
print(relay.argmax(model(test_data[0])))
•
•  + : af
• t k k
• m t y nR + ,
• d S pD
• ) : + , u f c e
s
• n b
• ( - - :
• W g l i h n “ or
u gas
• -. -d oauSN V
• rw avV
• 21T Dr S M Tfgic
• V Vua gasSJ P L JP T
• nih 3 2 3 2 3 1
, 3 2 2 22
• f ir ni dDm
• y VnDet M I fl
N n
• N bN IoH PR n t d
N kyN e aE 5 4 4 . -
• c a s 5 T gi
N 4 N e a
p fN
• 4 5 4 7 4 Ip e a
• ,K
• r N hNi T aS KIRP L f
• i T a N lN O
• M
• 2 871 0
•
7 /17 + + C 2
• 1
• / 0 /
【論文紹介】Relay: A New IR for Machine Learning Frameworks

【論文紹介】Relay: A New IR for Machine Learning Frameworks

  • 1.
    Relay: A NewIR for Machine Learning Frameworks Jared Roesch Steven Lyubomirsky Logan Weber Josh Pollock jroesch@cs.uw.edu sslyu@cs.uw.edu weberlo@cs.uw.edu joshpoll@cs.uw.edu Marisa Kirisame Tianqi Chen Zachary Tatlock jerry96@cs.uw.edu tqchen@cs.uw.edu ztatlock@cs.uw.edu Paul G. Allen School of Computer Science and Engineering University of Washington, Seattle, WA, USA Abstract Machine learning powers diverse services in industry in- cluding search, translation, recommendation systems, and security. The scale and importance of these models require that they be ecient, expressive, and portable across an ar- ray of heterogeneous hardware devices. These constraints 1 Introduction Machine learning (ML) has dramatically reshaped computer vision [17, 21], natural language processing, robotics [14], and computational biology and is continuing to gain trac- tion in new areas, including program synthesis [6]. Deep learning (DL) in particular has driven progress in these areas 2 . 1 # 0 83
  • 3.
    T FEFJ B •8N 9 • m efc • pl iSr~ ] • 2 G AE 8N ( 9 • . / wSrbnS o [ ” i ud • S lg ksc 8 ( N9 00 8 ( N9 • ” i ud • • , 1 BIFE am efc “ TP F JL IJ JAFEP P FM P • A lp T 7MG I E F DDAE 2 E IP F P7 2P • • 2 y tS 8 )N9
  • 4.
  • 5.
    • B KTC O M aSBD B e T LGN • • C • C e • C LGN DX
  • 6.
    (MAPL’18, June 18,2018, Philadelphia, PA, USA Frameworks Computational Graph High level Data-flow Rewriting Tensor Operator Description Schedule LLVMAccelerators CUDA/Metal/OpenCL CNTK CoreML These graphs are easy to optimize struct programs in a deeply-embe guage (eDSL) without high-level a A more expressive style popular works like Chainer, PyTorch, and G tion of graphs with dynamic topol runtime data and support diere tive computations. This expressiv user but has limited the ability fo optimize user-dened graphs. Mo requires a Python interpreter, ma accelerators and FPGAs extremely In summary, static graphs are e the expressivity found in higher-l M I I M H M N I
  • 7.
    • R • MN • •V J • I •
  • 8.
    programs’ computational expressivity.Frameworks like Ten- sorFlow represent dierentiable computation using static graphs, which are dataow graphs with a xed topology. Relay Fusion, Layout Change, Partial Eval, Traditional Optimizations Tensor Operator Description Schedule Hardware Implementation Frameworks CNTK CoreML Relay Python Decorator Operators Relay runtime system Control Figure 2. The new TVM stack integrated with Relay. t a 2 C a c f w h w
  • 9.
    programs’ computational expressivity.Frameworks like Ten- sorFlow represent dierentiable computation using static graphs, which are dataow graphs with a xed topology. Relay Fusion, Layout Change, Partial Eval, Traditional Optimizations Tensor Operator Description Schedule Hardware Implementation Frameworks CNTK CoreML Relay Python Decorator Operators Relay runtime system Control Figure 2. The new TVM stack integrated with Relay. t a 2 C a c f w h w S RT IMT D L
  • 10.
    1 A1 4 11 1 3 4 .D 4 4 4 3 A 1 2 554 4 1 D 4 .D 4 A A 4 A 4 .D 4 E1 . 5 1 4 44 A 4 21 22A 12D D 4 .D 4 4 LGF
  • 11.
    A 7 C2C • 1 C • +. • 1 2C A A F A 1 - • 6 6 . 1 • . A : : • / C 6 . 1 • - 1 1AA • 1 • / + 6 • -1 1 TV I MT • T RIP
  • 12.
    :CC C: 6C 7 • A7 , • 1 • CA + F • CA3 C • hw e • C 7 • 7 3 a i pf w lfs P a d • C 7 hw , • 1 - 67 • T R wfn b O c 7 3 e v / C: • Iit m • ,. uon O O M gry l O 7 7 C e Vd bc
  • 13.
    • i b •, , , . . . ., E r ols y r ols , . : N • w“ C ogD LC D • r om r omw“ P w“ f • u t • r an d escp • C D f • C D
  • 14.
  • 15.
    ) ) 1 @relay_model deflenet(x: Tensor[Float, (1, 28, 28)]) - Tensor[Float, 10]: conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1], no_bias=False) tanh1 = relay.tanh(conv1) pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1], no_bias=False) tanh2 = relay.tanh(conv2) pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2) fc1 = relay.linear(flatten, num_hidden=500) tanh3 = relay.tanh(fc1) return relay.linear(tanh3, num_hidden=10)
  • 16.
    ) ) 2 @relay defloss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) - Float: return relay.softmax_cross_entropy(lenet(x), y) @relay def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) - Model: model = relay.create_model(lenet) for x, y in data: model_grad = relay.grad(model, loss, (x, y)) relay.update_model_params(model, model_grad) return relay.export_model(model) training_data, test_data = relay.datasets.mnist() model = train_lenet(training_data) print(relay.argmax(model(test_data[0])))
  • 17.
    • • +: af • t k k • m t y nR + , • d S pD • ) : + , u f c e s • n b • ( - - : • W g l i h n “ or
  • 18.
    u gas • -.-d oauSN V • rw avV • 21T Dr S M Tfgic • V Vua gasSJ P L JP T • nih 3 2 3 2 3 1 , 3 2 2 22 • f ir ni dDm • y VnDet M I fl
  • 19.
    N n • NbN IoH PR n t d N kyN e aE 5 4 4 . - • c a s 5 T gi N 4 N e a p fN • 4 5 4 7 4 Ip e a • ,K • r N hNi T aS KIRP L f • i T a N lN O
  • 20.
    • M • 2871 0 • 7 /17 + + C 2
  • 21.