4. iDeep
Intel Deep Learning Extension Package is a module for collection of
accelerated deep learning operations like convolution, deconvolution, relu
etc. It uses Intel MKL-DNN as acceleration engine. (github: intel/ideep)
$ pip install ideep4py
$ export CHAINER_USE_IDEEP=auto
$ python your_script.py
→ CPU-mode computation will be made faster
6. cuDNN enhancements
•Autotuner chooses the best algorithm of each convolution
based on the measured timings.
Usage: chainer.config.autotune = True
•TensorCore is used in convolutions if some requirements are
met (e.g. Volta GPU, cuDNN 7+, FP16 inputs/weights)
Usage: nothing to do
7. Sequential chain (experimental)
•Sequential composition of multiple
callables
•Behaves like ChainList: supports
Link interface and recognizes
links in the sequence as children
model = chainer.Sequential(
L.Linear(1000),
F.relu,
lambda x: F.dropout(x, 0.2),
L.Linear(1000),
F.relu,
lambda x: F.dropout(x, 0.2),
L.Linear(10),
)
8. Other important updates
•FP16 training support: loss scaling
and FP32 updates
•Documentation has been
reorganized. We are still welcoming
any requests and feedbacks!
12. NumPy-compatible interface for Variable
xp = ...
x = chainer.Variable(xp.array(...))
m = F.max(x, axis=1, keepdims=True)
x1 = x - F.broadcast_to(m, x.shape)
y = m + F.log(F.sum(F.exp(x1), axis=1, keepdims=True))
chainer.set_default_device('cuda:0')
x = dnp.array(...).require_grad()
m = x.max(axis=1, keepdims=True)
x1 = x - m
y = m + dnp.log(dnp.sum(dnp.exp(x1), axis=1, keepdims=True))
Note: the interface is not fixed yet!!!
13. Distribution implementations
•Differentiable computations of distribution-specific values
• Probability density/mass at given points
• Statistics, e.g. mean, var
• Reparameterized sampling
•Currently developing the core design. Once it gets done, we will
work on widening the collection of supported distributions.
14. FP16 mode
The default dtype (float32 in most places) will be modifiable via
CHAINER_DTYPE and chainer.config.dtype
•Initial weight values
•Dataset dtype
This feature makes it easy to start using FP16 without changing
your code.
15. Other usability features
•ChainerMN integration
• ChainerMN will be merged into Chainer core repository/package
• In the current plan, the interface and usage will not be changed a lot
•Baseline experiment code generator
• Generate a baseline code for your experiment with one or few
commands (like scaffolding in Ruby on Rails)
• Make it quicker to start experiments
16. Static subgraph caching
•Cache the graph at the first call, and
reuse it in later calls
•Remove the overhead of graph
construction at each iteration
•Could be used to optimize the
computation
class NN(chainer.Chain):
def __init__(self):
with self.init_scope():
self.l1 = L.Linear(1000)
self.l2 = L.Linear(1000)
self.l3 = L.Linear(10)
@static_graph
def __call__(self, x):
h = F.relu(self.l1(x))
h = F.relu(self.l2(h))
return self.l3(h)
17. Summary
•Chainer v4 includes CPU/GPU performance improvements,
Sequential, more double backprop support, reorganized
documentation, etc.
•ONNX-Chainer also makes it easy to deploy your models
•We keep working on improving the performance, and also going
towards much easier/less programming for deep learning with
Chainer in v5