Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Chainer v1.5


Published on

Slide of the talk at Chainer Meetup on Oct. 14, 2015 at PFN.

Published in: Technology
  • Be the first to comment

Towards Chainer v1.5

  1. 1. Towards  Chainer  v1.5 10/14  Chainer  meetup  @  PFI/PFN Seiya  Tokui  (Preferred  Networks)
  2. 2. Development  history l  6/12:  v1.0 –  Basics  of  Variable/Function,  FunctionSet  &  Optimizer,  CUDA  support l  7/7:  v1.1 –  Caffe  referece  model,  type  checking  (forward/backward),  Py3  support l  8/19:  v1.2 –  Many  functions  are  added,  collect_̲parameters  is  deprecated,  remove  type   checking  on  backward l  9/2:  v1.3 –  CuPy,  functions  module  is  reorganized 2
  3. 3. CuPy l  CUDA  array  implementation  with  NumPy-‐‑‒subset  API l  Custom  elementwise  and  reduction  kernels  are  still  supported  (with   broadcasting) l  No  dependence  on  PyCUDA  and  scikits.cuda –  Cf.)  sudden  renaming  of  scikit-‐‑‒cuda  to  scikits.cuda l  NumPy  API  coverage  is  still  incomplete l  Most  operations  are  not  supported  yet  on  the  Function/Variable  level 3
  4. 4. Development  history l  6/12:  v1.0 –  Basics  of  Variable/Function,  FunctionSet  &  Optimizer,  CUDA  support l  7/7:  v1.1 –  Caffe  referece  model,  type  checking  (forward/backward),  Py3  support l  8/19:  v1.2 –  Many  functions  are  added,  collect_̲parameters  is  deprecated,  remove  type   checking  on  backward l  9/2:  v1.3 –  CuPy,  functions  module  is  reorganized l  10/28:  v1.4  (planned,  delayed) –  Some  functions  are  added? 4
  5. 5. The  cause  of  the  delay l  New  model  structure  (#363) l  Iʼ’ve  been  working  on  this  since  the  release  of  v1.3 l  It  is  unexpectedly  difficult  to  make  the  design –  Still  in  designing  phase –  Iʼ’m  planning  to  release  this  feature  in  v1.5 5
  6. 6. Objective l  Replacement  of  FunctionSet/Optimizer l  Goals: –  Provide  a  solid  way  of  sharing  and  reusing  (sub)network  definitions –  Avoid  the  “to_̲cpu/to_̲gpu  trap”  between  FunctionSet  and  Optimizer –  Portable  save/load –  Make  all  functions  pure  for  more  flexibility  and  reusability 6
  7. 7. Solution  (current  idea) l  Hierarchy  of  network  definitions l  Example: –  An  autoencoder  uses  an  encoder  network  and  a  decoder  network –  Each  of  the  networks  might  be  MLPs,  ConvNets,  etc. –  MLP  consists  of  several  fully-‐‑‒connected  layers –  Each  fully-‐‑‒connected  layer  defines  a  simple  operation  on  the  input  variable l  Call  each  component  a  chain l  Modeling  in  Chainer  will  be  linking  several  chains  into  one  big  chain 7
  8. 8. Terminology l  Link –  A  minimal  component  of  the  chain  (e.g.  Linear,  Convolution2D,  etc.) –  “Parameterized  function”  in  the  previous  versions –  It  combines  parameter  variables  with  input  variables  to  compute  the  output   variables l  Chain,  ChainList –  Composition  of  child  chains  (including  links) –  Chain  manages  the  child  chains  by  a  dictionary,  while  ChainList  does  by  a  list 8
  9. 9. Schematic  of  Link/Chain 9 Linear Linear Linear Link Chain Function layer1 layer2 layer3 predictor x t loss Example  of  a  classifier  with  a  multi-‐‑‒layer  perceptron MLP Classifier
  10. 10. Schematic  of  Link/Chain Example  of  Variational  AutoEncoder 10 Linear Linear Linear Linear Linearx kld nll loss + encoder decoder z VariationalAutoEncoder MLP MLP(?)
  11. 11. Define  by  Run l  Note  that  these  diagrams  do  not  mean  the  computational  graph  must  be   fixed  at  the  defnition  of  chains –  The  graph  is  dynamically  constructed  on  the  forward  computation  (define-‐‑‒by-‐‑‒ run) l  A  chain  might  implements  multiple  methods  that  constructs  different   graphs 11
  12. 12. Example  (gist: 12
  13. 13. Example  (gist: 13
  14. 14. Example  (gist: 14 User can freely design the predictor chain.
  15. 15. Example  (gist: 15
  16. 16. Example  (gist: 16 User can freely design the encoder/decoder chains.
  17. 17. Planned  features  of  Link/Chain/ChainList l  The  hierarchy  is  directly  mapped  to  HDF5  format  on  serialization –  Only  the  parameters  and  auxiliary  variables  (computed  by  learning)  are  saved l  Helper  method  to  traverse  the  hierarchy –  Iterate  all  subchains  in  the  hierarchy –  Iterate  all  parameter  variables  in  the  hierarchy 17
  18. 18. New  Optimizer l  Optimizer  is  also  updated l  Optimizer  will  be  aware  of  the  target  chain –  Track  the  migration  of  the  target  chain  between  CPUs  and  GPUs l  Optimizer  is  also  serializable  (in  HDF5  format) 18
  19. 19. Parallel  work:  introduction  of  Cython l  CuPy  drawback:  the  CPU  side  manipulation  is  slow l  No  single  huge  bottleneck:  the  cause  of  slow  down  is  already  scattered l  The  easiest  point  to  fix:  ctypes –  ctypes  is  verrrrrrrrrrrry  slow –  Even  extracting  the  current  device  consumes  non-‐‑‒negligible  running  time –  @okuta  san  is  trying  to  make  Cython  replace  it l  Major  impact  on  the  Chainer  package –  Low  level  interface  will  change –  is  drastically  updated  (since  Cython  extension  requires  Cython  to   build,  while  we  have  to  make  the  package  installable  to  environments  into   which  Cython  is  not  installed  yet) 19
  20. 20. Future  work l  Lazy  computation –  See  VAE  example:  it  computes  all  intermediate  variables  in  the  _̲_̲call_̲_̲   operator,  while  there  might  be  a  usage  that  a  user  only  wants  some  of  them –  Chainer  currently  computes  eagerly,  which  causes  unneeded  computations –  Avoiding  unneeded  computations  is  one  of  the  easiest  graph  optimization –  More  in  general,  I  believe  that  the  future  is  in  fusion  of  symbolic  and   dynamic  paradigms l  Symbolic  optimization  of  computations  on  Variables  (loop  fusion,  etc.) l  Variable  tags  (or  annotations) –  Cf.)  Blocks l  Learning  process  abstraction,  Data  loading  abstraction,  etc. 20