Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Chainer v2 and future dev plan by Seiya Tokui 3217 views
- 「ChainerCVとOpenCVではじめる物体検出」のための事前準備 by shinozaki_takashi 3252 views
- Introduction to Chainer by Seiya Tokui 3258 views
- [Dl輪読会]video pixel networks by Deep Learning JP 643 views
- Variational AutoEncoder by Kazuki Nitta 5960 views
- Introduction to Chainer: A Flexible... by Seiya Tokui 48699 views

4,172 views

Published on

Published in:
Technology

No Downloads

Total views

4,172

On SlideShare

0

From Embeds

0

Number of Embeds

1,147

Shares

0

Downloads

18

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Chainer v3 Chainer Meetup #06 @ PFN, Sep. 30, 2017 Seiya Tokui @ Preferred Networks
- 2. Recent/coming releases • Chainer v3.0.0 RC, v2.1.0: Sep. 12 • v3 RC was the 50th release! • CuPy v2.0.0 RC, v1.0.3 on the same day • Next release: Chainer v3.0.0 and v4.0.0α on Oct. 17 • CuPy v2.0.0 and v3.0.0α on the same day • Today, I mainly talk about the features of CuPy v2.0.0 RC and Chainer v3.0.0 RC
- 3. Chainer v3.0.0rc1 • For most users, the backward compatibility is maintained • See the release notes of v3.0.0rc1 for some small breaks that do not affect most users • The inner-working is greatly changed • It may cause some existing code that directly touches the computational graphs broken • Thanks to this change, we now support double backprop (a.k.a. gradient of gradients) as announced
- 4. Double backprop • Automatic backpropagation through gradients • When is it needed? • Consider a loss function that includes a gradient computation as a term/factor • E.g. the loss function for WGAN-GP: 𝔼 𝑥∼ℙ 𝑔 𝐷 𝑥 − 𝔼 𝑥∼ℙ 𝑟 𝐷 𝑥 + 𝜆𝔼 𝑥∼ℙ 𝑥 𝛻𝑥 𝐷 𝑥 2 − 1 2 • To take the gradient of this loss function, we need to do backprop through 𝛻𝑥 𝐷( 𝑥), which itself we want to compute with backprop! gradient
- 5. Double backprop in Chainer v3 • Many functions now support double backprop • Those functions are rewritten to implement a new interface named FunctionNode (such functions are called new-style Functions) • backward() takes Variable instead of ndarray as grad_outputs and return values, which means backward() itself can be differentiated • Variable has now an attribute grad_var, which represents the gradient as a Variable (so that we can use it in the computational graph)
- 6. How to implement WGAN-GP 1. Using Variable.backward() x_tilde = generator(z) x_hat = x + u * (x_tilde – x) D(x_hat).backward(enable_double_backprop=True) # 1st diff gp = lambda * (x_hat.grad_var – 1) ** 2 loss = D(x_tilde) – D(x) + gp model.cleargrads() # to clear the 1st diff of params loss.backward() # 2nd diff
- 7. How to implement WGAN-GP 2. Using grad() x_tilde = generator(z) x_hat = x + u * (x_tilde – x) gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True) # 1st diff gp = lambda * (gx_hat – 1) ** 2 loss = D(x_tilde) – D(x) + gp loss.backward() # 2nd diff This version is more efficient because grad() can skip the gradient computation for parameters (thus also we can drop cleargrads()).
- 8. New-style Function support • Most “standard” functions are now ported to the new-style interface: +, -, *, Convolution2D, Deconvolution2D, EmbedID, Linear, LSTM, BatchNormalization, sigmoid, relu, leaky_relu, softmax, log_softmax, tanh, exp, mean_squared_error, softmax_cross_entropy, dropout, layer_normalization, transpose, reshape, broadcast_to, sum, concat, __getitem__, etc… • We are still working on widening the double backprop support. Contributions are also welcome!!
- 9. Other features • Functions: layer_normalization, selu, arctan2, prod, NumPy-compatible matmul • Links: ChildSumTreeLSTM, NaryTreeLSTM, BatchRenormalization • Other new features: LeCunNormal, as_variable(), Variable.array, strict option of load_npz(), etc.
- 10. CuPy v2.0.0rc1 • Sparse matrix support • Complex number support • Improved memory allocator • Many new functions, esp. of linear algebra routines
- 11. Sparse matrix support • cupy.sparse --- the sparse matrix support with APIs compatible to scipy.sparse • CSR/CSC/COO and diagonal format • Basic arithmetics, matrix product, element indexing • Slicing along the major axis • Dense <-> Sparse conversion
- 12. Complex number support • CuPy now supports complex numbers! • Dtypes complex32, complex64, complex128 are now available • Routines related to complex numbers: angle, conj, imag, real
- 13. Linear algebra routines • Solvers, matrix inversion, determinant, eigenvalues, etc.: solve, tensorsolve, inv, pinv, det, slogdet, eigh, eigvalsh, matrix_rank • All under cupy.linalg namespace • einsum is also supported (thanks, @fukatani!) • Flexible tensor product/reduction based on Einstein convention
- 14. Improved memory allocator • The memory pool is greatly improved • It now uses “best-fit with coalescing” algorithm • The memory region is reused even if the size does not exactly match • It may also contribute to the speed improvement, thanks to the reduced number of reallocations • Example: the new seq2seq example originally uses all the memory of 12GB GPU, whose usage is reduced to 3GB, and also the execution time is reduced by appx. 25%.
- 15. Next versions • As you may know, we slightly changed the release policy again; the stable releases may now include some new features (thus v2.1.0 instead of v2.0.3). • v4 is scheduled based on our release policy: v4.0.0 will be three months after v3.0.0 (which will be mid Jan. if there is no delay). • The core features of v4 is not determined yet; let’s have discussions!

No public clipboards found for this slide

Be the first to comment