Chainer v4 and v5
April 25, 2018 @ Preferred Networks
Seiya Tokui
Chainer v4
Chainer v4Chainer v4
Performance
• Intel/chainer integration
(iDeep)
• cuDNN enhancements
(autotuned, TensorCore,
NCCL2)
Usability
• chainer.Sequential
• Reorganized documentation
Deploy
• Caffe export
• ONNX-Chainer
(as a separate package)
iDeep
Intel Deep Learning Extension Package is a module for collection of
accelerated deep learning operations like convolution, deconvolution, relu
etc. It uses Intel MKL-DNN as acceleration engine. (github: intel/ideep)
$ pip install ideep4py
$ export CHAINER_USE_IDEEP=auto
$ python your_script.py
→ CPU-mode computation will be made faster
Each network is tested with batch size 1 and 32.
cuDNN enhancements
•Autotuner chooses the best algorithm of each convolution
based on the measured timings.
Usage: chainer.config.autotune = True
•TensorCore is used in convolutions if some requirements are
met (e.g. Volta GPU, cuDNN 7+, FP16 inputs/weights)
Usage: nothing to do
Sequential chain (experimental)
•Sequential composition of multiple
callables
•Behaves like ChainList: supports
Link interface and recognizes
links in the sequence as children
model = chainer.Sequential(
L.Linear(1000),
F.relu,
lambda x: F.dropout(x, 0.2),
L.Linear(1000),
F.relu,
lambda x: F.dropout(x, 0.2),
L.Linear(10),
)
Other important updates
•FP16 training support: loss scaling
and FP32 updates
•Documentation has been
reorganized. We are still welcoming
any requests and feedbacks!
Caffe export / ONNX-Chainer
model = L.VGG16Layers()
# Pseudo input
x = np.zeros((1, 3, 224, 224), dtype=np.float32)
# Caffe export
caffe.export(model, [x], directory='.',
graph_name='VGG16')
# ONNX-Chainer
chainer.config.train = False
onnx_chainer.export(model, x, filename='VGG16.onnx')
Chainer v5 – planned features
Chainer v5 – planned featuresChainer v5 – planned features
Usability
• NumPy-compat API
• Distributions support
• FP16 mode
• ChainerMN integration
• Baseline code generator
Performance
• Static subgraph caching
NumPy-compatible interface for Variable
xp = ...
x = chainer.Variable(xp.array(...))
m = F.max(x, axis=1, keepdims=True)
x1 = x - F.broadcast_to(m, x.shape)
y = m + F.log(F.sum(F.exp(x1), axis=1, keepdims=True))
chainer.set_default_device('cuda:0')
x = dnp.array(...).require_grad()
m = x.max(axis=1, keepdims=True)
x1 = x - m
y = m + dnp.log(dnp.sum(dnp.exp(x1), axis=1, keepdims=True))
Note: the interface is not fixed yet!!!
Distribution implementations
•Differentiable computations of distribution-specific values
• Probability density/mass at given points
• Statistics, e.g. mean, var
• Reparameterized sampling
•Currently developing the core design. Once it gets done, we will
work on widening the collection of supported distributions.
FP16 mode
The default dtype (float32 in most places) will be modifiable via
CHAINER_DTYPE and chainer.config.dtype
•Initial weight values
•Dataset dtype
This feature makes it easy to start using FP16 without changing
your code.
Other usability features
•ChainerMN integration
• ChainerMN will be merged into Chainer core repository/package
• In the current plan, the interface and usage will not be changed a lot
•Baseline experiment code generator
• Generate a baseline code for your experiment with one or few
commands (like scaffolding in Ruby on Rails)
• Make it quicker to start experiments
Static subgraph caching
•Cache the graph at the first call, and
reuse it in later calls
•Remove the overhead of graph
construction at each iteration
•Could be used to optimize the
computation
class NN(chainer.Chain):
def __init__(self):
with self.init_scope():
self.l1 = L.Linear(1000)
self.l2 = L.Linear(1000)
self.l3 = L.Linear(10)
@static_graph
def __call__(self, x):
h = F.relu(self.l1(x))
h = F.relu(self.l2(h))
return self.l3(h)
Summary
•Chainer v4 includes CPU/GPU performance improvements,
Sequential, more double backprop support, reorganized
documentation, etc.
•ONNX-Chainer also makes it easy to deploy your models
•We keep working on improving the performance, and also going
towards much easier/less programming for deep learning with
Chainer in v5
Chainer v4 and v5

Chainer v4 and v5

  • 1.
    Chainer v4 andv5 April 25, 2018 @ Preferred Networks Seiya Tokui
  • 2.
  • 3.
    Chainer v4Chainer v4 Performance •Intel/chainer integration (iDeep) • cuDNN enhancements (autotuned, TensorCore, NCCL2) Usability • chainer.Sequential • Reorganized documentation Deploy • Caffe export • ONNX-Chainer (as a separate package)
  • 4.
    iDeep Intel Deep LearningExtension Package is a module for collection of accelerated deep learning operations like convolution, deconvolution, relu etc. It uses Intel MKL-DNN as acceleration engine. (github: intel/ideep) $ pip install ideep4py $ export CHAINER_USE_IDEEP=auto $ python your_script.py → CPU-mode computation will be made faster
  • 5.
    Each network istested with batch size 1 and 32.
  • 6.
    cuDNN enhancements •Autotuner choosesthe best algorithm of each convolution based on the measured timings. Usage: chainer.config.autotune = True •TensorCore is used in convolutions if some requirements are met (e.g. Volta GPU, cuDNN 7+, FP16 inputs/weights) Usage: nothing to do
  • 7.
    Sequential chain (experimental) •Sequentialcomposition of multiple callables •Behaves like ChainList: supports Link interface and recognizes links in the sequence as children model = chainer.Sequential( L.Linear(1000), F.relu, lambda x: F.dropout(x, 0.2), L.Linear(1000), F.relu, lambda x: F.dropout(x, 0.2), L.Linear(10), )
  • 8.
    Other important updates •FP16training support: loss scaling and FP32 updates •Documentation has been reorganized. We are still welcoming any requests and feedbacks!
  • 9.
    Caffe export /ONNX-Chainer model = L.VGG16Layers() # Pseudo input x = np.zeros((1, 3, 224, 224), dtype=np.float32) # Caffe export caffe.export(model, [x], directory='.', graph_name='VGG16') # ONNX-Chainer chainer.config.train = False onnx_chainer.export(model, x, filename='VGG16.onnx')
  • 10.
    Chainer v5 –planned features
  • 11.
    Chainer v5 –planned featuresChainer v5 – planned features Usability • NumPy-compat API • Distributions support • FP16 mode • ChainerMN integration • Baseline code generator Performance • Static subgraph caching
  • 12.
    NumPy-compatible interface forVariable xp = ... x = chainer.Variable(xp.array(...)) m = F.max(x, axis=1, keepdims=True) x1 = x - F.broadcast_to(m, x.shape) y = m + F.log(F.sum(F.exp(x1), axis=1, keepdims=True)) chainer.set_default_device('cuda:0') x = dnp.array(...).require_grad() m = x.max(axis=1, keepdims=True) x1 = x - m y = m + dnp.log(dnp.sum(dnp.exp(x1), axis=1, keepdims=True)) Note: the interface is not fixed yet!!!
  • 13.
    Distribution implementations •Differentiable computationsof distribution-specific values • Probability density/mass at given points • Statistics, e.g. mean, var • Reparameterized sampling •Currently developing the core design. Once it gets done, we will work on widening the collection of supported distributions.
  • 14.
    FP16 mode The defaultdtype (float32 in most places) will be modifiable via CHAINER_DTYPE and chainer.config.dtype •Initial weight values •Dataset dtype This feature makes it easy to start using FP16 without changing your code.
  • 15.
    Other usability features •ChainerMNintegration • ChainerMN will be merged into Chainer core repository/package • In the current plan, the interface and usage will not be changed a lot •Baseline experiment code generator • Generate a baseline code for your experiment with one or few commands (like scaffolding in Ruby on Rails) • Make it quicker to start experiments
  • 16.
    Static subgraph caching •Cachethe graph at the first call, and reuse it in later calls •Remove the overhead of graph construction at each iteration •Could be used to optimize the computation class NN(chainer.Chain): def __init__(self): with self.init_scope(): self.l1 = L.Linear(1000) self.l2 = L.Linear(1000) self.l3 = L.Linear(10) @static_graph def __call__(self, x): h = F.relu(self.l1(x)) h = F.relu(self.l2(h)) return self.l3(h)
  • 17.
    Summary •Chainer v4 includesCPU/GPU performance improvements, Sequential, more double backprop support, reorganized documentation, etc. •ONNX-Chainer also makes it easy to deploy your models •We keep working on improving the performance, and also going towards much easier/less programming for deep learning with Chainer in v5