Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chainer v3

Introduction of new features of Chainer v3.0.0rc1 and CuPy v2.0.0rc1. Presented at Chainer Meetup #06 in Tokyo.

  • Login to see the comments

Chainer v3

  1. 1. Chainer v3 Chainer Meetup #06 @ PFN, Sep. 30, 2017 Seiya Tokui @ Preferred Networks
  2. 2. Recent/coming releases • Chainer v3.0.0 RC, v2.1.0: Sep. 12 • v3 RC was the 50th release! • CuPy v2.0.0 RC, v1.0.3 on the same day • Next release: Chainer v3.0.0 and v4.0.0α on Oct. 17 • CuPy v2.0.0 and v3.0.0α on the same day • Today, I mainly talk about the features of CuPy v2.0.0 RC and Chainer v3.0.0 RC
  3. 3. Chainer v3.0.0rc1 • For most users, the backward compatibility is maintained • See the release notes of v3.0.0rc1 for some small breaks that do not affect most users • The inner-working is greatly changed • It may cause some existing code that directly touches the computational graphs broken • Thanks to this change, we now support double backprop (a.k.a. gradient of gradients) as announced
  4. 4. Double backprop • Automatic backpropagation through gradients • When is it needed? • Consider a loss function that includes a gradient computation as a term/factor • E.g. the loss function for WGAN-GP: 𝔼 𝑥∼ℙ 𝑔 𝐷 𝑥 − 𝔼 𝑥∼ℙ 𝑟 𝐷 𝑥 + 𝜆𝔼 𝑥∼ℙ 𝑥 𝛻𝑥 𝐷 𝑥 2 − 1 2 • To take the gradient of this loss function, we need to do backprop through 𝛻𝑥 𝐷( 𝑥), which itself we want to compute with backprop! gradient
  5. 5. Double backprop in Chainer v3 • Many functions now support double backprop • Those functions are rewritten to implement a new interface named FunctionNode (such functions are called new-style Functions) • backward() takes Variable instead of ndarray as grad_outputs and return values, which means backward() itself can be differentiated • Variable has now an attribute grad_var, which represents the gradient as a Variable (so that we can use it in the computational graph)
  6. 6. How to implement WGAN-GP 1. Using Variable.backward() x_tilde = generator(z) x_hat = x + u * (x_tilde – x) D(x_hat).backward(enable_double_backprop=True) # 1st diff gp = lambda * (x_hat.grad_var – 1) ** 2 loss = D(x_tilde) – D(x) + gp model.cleargrads() # to clear the 1st diff of params loss.backward() # 2nd diff
  7. 7. How to implement WGAN-GP 2. Using grad() x_tilde = generator(z) x_hat = x + u * (x_tilde – x) gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True) # 1st diff gp = lambda * (gx_hat – 1) ** 2 loss = D(x_tilde) – D(x) + gp loss.backward() # 2nd diff This version is more efficient because grad() can skip the gradient computation for parameters (thus also we can drop cleargrads()).
  8. 8. New-style Function support • Most “standard” functions are now ported to the new-style interface: +, -, *, Convolution2D, Deconvolution2D, EmbedID, Linear, LSTM, BatchNormalization, sigmoid, relu, leaky_relu, softmax, log_softmax, tanh, exp, mean_squared_error, softmax_cross_entropy, dropout, layer_normalization, transpose, reshape, broadcast_to, sum, concat, __getitem__, etc… • We are still working on widening the double backprop support. Contributions are also welcome!!
  9. 9. Other features • Functions: layer_normalization, selu, arctan2, prod, NumPy-compatible matmul • Links: ChildSumTreeLSTM, NaryTreeLSTM, BatchRenormalization • Other new features: LeCunNormal, as_variable(), Variable.array, strict option of load_npz(), etc.
  10. 10. CuPy v2.0.0rc1 • Sparse matrix support • Complex number support • Improved memory allocator • Many new functions, esp. of linear algebra routines
  11. 11. Sparse matrix support • cupy.sparse --- the sparse matrix support with APIs compatible to scipy.sparse • CSR/CSC/COO and diagonal format • Basic arithmetics, matrix product, element indexing • Slicing along the major axis • Dense <-> Sparse conversion
  12. 12. Complex number support • CuPy now supports complex numbers! • Dtypes complex32, complex64, complex128 are now available • Routines related to complex numbers: angle, conj, imag, real
  13. 13. Linear algebra routines • Solvers, matrix inversion, determinant, eigenvalues, etc.: solve, tensorsolve, inv, pinv, det, slogdet, eigh, eigvalsh, matrix_rank • All under cupy.linalg namespace • einsum is also supported (thanks, @fukatani!) • Flexible tensor product/reduction based on Einstein convention
  14. 14. Improved memory allocator • The memory pool is greatly improved • It now uses “best-fit with coalescing” algorithm • The memory region is reused even if the size does not exactly match • It may also contribute to the speed improvement, thanks to the reduced number of reallocations • Example: the new seq2seq example originally uses all the memory of 12GB GPU, whose usage is reduced to 3GB, and also the execution time is reduced by appx. 25%.
  15. 15. Next versions • As you may know, we slightly changed the release policy again; the stable releases may now include some new features (thus v2.1.0 instead of v2.0.3). • v4 is scheduled based on our release policy: v4.0.0 will be three months after v3.0.0 (which will be mid Jan. if there is no delay). • The core features of v4 is not determined yet; let’s have discussions!