Introduction to Chainer: A Flexible Framework for Deep Learning

Seiya Tokui
Seiya TokuiEngineer at Preferred Infrastructure, Inc.
Introduction  to  Chainer:
A  Flexible  Framework  for  Deep  Learning
2015-‐‑‒06-‐‑‒18  PFI/PFN  Weekly  Seminar
Seiya  Tokui  (Preferred  Networks)
Self-‐‑‒Introduction
l  Seiya  Tokui    @beam2d  (Twitter,  GitHub)
l  Researcher  at  Preferred  Networks
l  Main  focus:  machine  learning
–  Learning  to  Hash  (master  degree)
–  Deep  Learning,  Representation  Learning  (current  focus)
2
3
A Powerful, Flexible, and Intuitive Framework of Neural Networks
Today  I  will  introduce:
l  The  features  of  Chainer
l  How  to  use  Chainer
l  Some  planned  features
l  (Slide  in  English,  talk  in  Japanese)
: The Concept
5
Chainer  is  a  framework  of  neural  networks
l  Official  site:  http://chainer.org  
l  Repository:  https://github.com/pfnet/chainer
l  Provided  as  a  Python  library  (PyPI:  chainer)
l  Main  features
–  Powerful:Supports  CUDA  and  multi-‐‑‒GPU  capability
–  Flexible: Support  almost  arbitrary  architectures
–  Intuitive: Forward  prop  can  be  written  as  a  regular  Python  code
Elements  of  a  neural  network  framework
l  Multi-‐‑‒dimensional  array  implementations
l  Layer  implementations
–  Called  in  various  names  (layers,  modules,  blocks,  primitives,  etc...)
–  The  smallest  units  of  automatic  differentiation
–  Contain  forward  and  backward  implementations
l  Optimizer  implementations
l  Other  stuffs  (data  loading  scheme,  training  loop,  etc...)
–  These  are  also  very  important,  though  Chainer  currently  does  not  
provide  their  abstraction  (future  work)
7
Forward  prop  /  Backprop
l  Forward  prop  is  how  we  want  to  process  the  input  data
l  Backprop  computes  its  gradient  for  the  learnable  parameters
l  Given  backward  procedures  of  all  layers,  backprop  can  be  written  as  
their  combination  (a.k.a.  reverse-‐‑‒mode  automatic  differentiation)
8
input hidden output groundtruth
loss  func
gradgradgrad
hidden
Backprop  Implementation  Paradigm  (1)
Define-‐‑‒and-‐‑‒Run
l  First,  a  computational  graph  is  constructed.  Then,  it  is  periodically  fed  
with  minibatches  to  do  forward/backward
l  The  computational  graph  can  be  seen  as  a  program  and  the  forward/
backward  computation  is  done  by  its  interpreter
u  Caffe:  the  program  is  written  by  Prototxt
u  Torch:  the  program  is  constructed  by  Lua  scripts
u  Theano-‐‑‒based  frameworks:  the  program  is  constructed  by  Python  
scripts
Backprop  Implementation  Paradigm  (2)
Define-‐‑‒and-‐‑‒Run  (cont.)
l  Pros
–  (Almost)  No  need  of  memory  management
–  The  computational  graph  can  be  implicitly  optimized  (cf.  Theano)
l  Cons
–  The  program  is  fixed  within  the  training  loop
–  The  interpreter  must  have  capability  of  defining  various  forward  
computations,  including  control-‐‑‒flow  statements  like  if  and  for
u  Theano  has  the  dedicated  functions  for  them  (ifelse  and  scan),  
which  are  unintuitive  and  not  Pythonic
–  Network  definition  is  hard  to  debug,  since  an  error  occurs  at  the  
forward  computation  that  is  far  apart  from  the  network  definition
Backprop  Implementation  Paradigm  (3)
Define-‐‑‒by-‐‑‒Run
l  The  forward  computation  is  written  as  a  regular  program  code  with  
special  variables  and  operators,  executing  which  simultaneously  involves  
the  forward  computation  and  the  graph  construction  (just  by  storing  the  
order  of  operations).
l  The  graph  is  used  for  the  backward  computation.
l  This  paradigm  enables  us  to  use  arbitrary  control  flow  statements  in  the  
forward  computation
–  No  need  of  a  mini  language  and  its  interpreter
l  It  also  makes  the  forward  computation  intuitive  and  easy  to  debug
Backprop  Implementation  Paradigm  (4)
Define-‐‑‒by-‐‑‒Run  (cont.)
l  The  computational  graph  can  be  modified  within  each  iteration
l  Example:  Truncated  BPTT  (BackProp  Through  Time)
–  BPTT:  Backprop  on  a  recurrent  net
–  Truncated  BPTT:  Truncate  the  backprop  at  some  time  point
–  Truncation  is  one  type  of  modification  of  the  computational  graph
Truncated
Features  of  Chainer
l  Define-‐‑‒by-‐‑‒Run  scheme
–  Forward  computation  can  contain  any  Python  code
u  if-else,  for-else,  break,  continue,  try-except-finally,  
list,  dict,  class,  etc...
–  User  can  modify  the  graph  within  the  loop
u  E.g.  truncation  can  be  done  by  unchain_̲backward  (which  
unchains  the  graph  backward  from  some  variable)
u  See  the  tutorial  on  recurrent  nets
http://docs.chainer.org/en/latest/tutorial/recurrentnet.html
l  Predefined  functions
l  Support  GPU(s)  via  PyCUDA
Example:  Training  a  multi-‐‑‒layer  perceptron  in  one  page
Full  code  is  in  the  tutorial  and  the  example  directory.
# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10))
opt = optimizers.SGD()
opt.setup(
model.collect_parameters())
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(...)
t = Variable(...)
opt.zero_grads()
loss = forward(x, t)
loss.backward()
opt.update()
Example:  Recurrent  net  language  model  in  one  page
Full  code  is  in  the  tutorial  and  the  example  directory.
# Model definition
model = FunctionSet(
emb=F.EmbedID(1000, 100),
x2h=F.Linear( 100, 50),
h2h=F.Linear( 50, 50),
h2y=F.Linear( 50, 1000))
opt = optimizers.SGD()
opt.setup(
model.collect_parameters())
# Forward computation of one step
def fwd1step(h, w, t):
x = F.tanh(model.emb(w))
h = F.tanh(model.x2h(x) + model.h2h(h))
y = model.h2y(h)
return h, F.softmax_cross_entropy(y, t)
# Full RNN forward computation
def forward(seq):
h = Variable(...) # init state
loss = 0
for curw, nextw in 
zip(seq, seq[1:]):
x = Variable(curw)
t = Variable(nextw)
h, new_loss = fwd1step(h, x, t)
loss += new_loss
return loss
: How to Use It
16
Install  Chainer
l  Prepare  a  Python  2.7  environment  with  pip
–  (Pyenv+)Anaconda  is  recommended
l  Install  Chainer  just  by
pip install chainer
l  If  you  want  to  use  GPU(s),  do:
–  Install  CUDA  and  the  corresponding  NVIDIA  driver
–  Install  dependent  packages  by
pip install chainer-cuda-deps
–  You  may  have  to  update  the  six package
pip install –U six
Run  the  MNIST  example  (quick  start)
l  Require  scikit-‐‑‒learn  installed:  pip install scikits.learn
l  Clone  the  repository  of  Chainer:  
git clone https://github.com/pfnet/chainer
l  Go  to  the  example  directory  at  examples/mnist
l  Then,  run  python train_mnist.py
–  Run  on  GPU  by  passing  --gpu=0
l  Other  examples  can  be  similarly  executed  (some  needs  manual  
preparation  of  datasets)
Read  the  documents
l  Read  the  documents  at  http://docs.chainer.org
l  It  includes:
–  Tutorial
–  Reference  manual
l  All  features  given  in  this  talk  are  introduced  by  the  tutorial,  so  please  try  
it  if  you  want  to  know  the  detail.
Basic  concepts  (1)
l  Essential  part  of  Chainer:  Variable  and  Function
l  Variable  is  a  wrapper  of  n-‐‑‒dimensional  arrays  (ndarray  and  GPUArray)
l  Function  is  an  operation  on  Variables
–  Function  application  is  memorized  by  the  returned  Variable(s)
–  All  operations  for  which  you  want  to  backprop  must  be  done  by  
Functions  on  Variables
l  Making  a  Variable  object  is  simple:  just  pass  an  array
x = chainer.Variable(numpy.ndarray(...))
–  The  array  is  stored  in  data  attribute  (x.data)
Basic  concepts  (2)
l  Example  of  the  computational  graph  construction
x = chainer.Variable(...)
y = chainer.Variable(...)
z = x**2 + 2*x*y + y
l  Gradient  of  z(x,  y)  can  be  computed  by  z.backward()
l  Results  are  stored  in  x.grad  and  y.grad
x
y
_ ** 2
2 * _ _ * _ _ + _ z
_ + _
Actually, Split nodes are automatically
inserted (they accumulate the gradients
on backprop)
Basic  concepts  (3)
l  Chainer  provides  many  functions  in  chainer.functions  subpackage
–  This  package  is  often  abbreviated  to  F
l  Parameterized  functions  are  provided  as  classes
–  Linear,  Convolution2D,  EmbedID,  PReLU,  BatchNormalization,  etc.
–  Their  instances  should  be  shared  across  all  iterations
l  Non-‐‑‒parameterized  functions  are  provided  as  Python  functions
–  Activation  functions,  pooling,  array  manipulation,  etc.
Basic  concepts  (4)
l  Use  FunctionSet  to  manage  parameterized  functions
–  It  is  an  object  with  Function  attributes
–  Easy  to  migrate  functions  onto  GPU  devices
–  Easy  to  collect  parameters  and  gradients  (collect_̲parameters)
l  Use  Optimizer  for  numerical  optimization
–  Major  algorithms  are  provided:
SGD,  MomentumSGD,  AdaGrad,  RMSprop,  ADADELTA,  Adam
–  Some  parameter/gradient  manipulations  are  done  via  this  class:
weight  decay,  gradient  clip,  
Easy  to  debug!
l  If  the  forward  computation  has  a  bug,  then  an  error  occurs  immediately  
at  the  appropriate  line  of  the  forward  definition
l  Example
–  This  code  has  inconsistency  of  the  array  size:
x = Variable(np.ndarray((3, 4), dtype=np.float32)
y = Variable(np.ndarray((3, 3), dtype=np.float32)
a = x ** 2 + x
b = a + y * 2
c = b + x * 2
–  Since  an  exception  is  raised  at  the  appropriate  line,  we  can  easily  find  
the  cause  of  bug  (this  is  one  big  difference  from  Define-‐‑‒and-‐‑‒Run  
frameworks)
← an exception is raised at this line
Graph  manipulation  (1)
l  Backward  unchaining:  y.unchain_backward()
–  It  purges  the  nodes  backward  from  y
–  It  is  useful  to  implement  truncated  BPTT  (see  PTB  example)
x f y g z
y g z
y.unchain_backward()
Graph  manipulation  (2)
l  Volatile  variables:  x = Variable(..., volatile=True)
–  Volatile  variable  does  not  build  a  graph
–  Volatility  can  be  accessed  directly  by  x.volatile
x = Variable(..., volatile=True)
y = f(x)
y.volatile = False
z = h(y)
x f y g z
Example:  Training  a  multi-‐‑‒layer  perceptron  in  one  page
Note:  F = chainer.functions
# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10))
opt = optimizers.SGD()
opt.setup(
model.collect_parameters())
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(...)
t = Variable(...)
opt.zero_grads()
loss = forward(x, t)
loss.backward()
opt.update()
Example:  Recurrent  net  language  model  in  one  page
# Model definition
model = FunctionSet(
emb=F.EmbedID(1000, 100),
x2h=F.Linear( 100, 50),
h2h=F.Linear( 50, 50),
h2y=F.Linear( 50, 1000))
opt = optimizers.SGD()
opt.setup(
model.collect_parameters())
# Forward computation of one step
def fwd1step(h, w, t):
x = F.tanh(model.emb(w))
h = F.tanh(model.x2h(x) + model.h2h(h))
y = model.h2y(h)
return h, F.softmax_cross_entropy(y, t)
# Full RNN forward computation
def forward(seq):
h = Variable(...) # init state
loss = 0
for curw, nextw in 
zip(seq, seq[1:]):
x = Variable(curw)
t = Variable(nextw)
h, new_loss = fwd1step(h, x, t)
loss += new_loss
return loss
CUDA  support  (1)
l  Chainer  supports  CUDA  computation
l  Installation
–  Install  CUDA  6.5+
–  Install  CUDA-‐‑‒related  packages  by
pip install chainer-cuda-deps
u  Build  of  PyCUDA  may  fail  if  you  install  CUDA  into  non-‐‑‒standard  
path.  In  such  case,  you  have  to  install  PyCUDA  from  source  code  
with  appropriate  configuration.
CUDA  support  (2)
l  Call  cuda.init() before  any  CUDA-‐‑‒related  operations
l  Converts  numpy.ndarray  into  GPUArray  by  chainer.cuda.to_gpu
data_gpu = chainer.cuda.to_gpu(data_cpu)
l  A  GPUArray  object  can  be  passed  to  the  Variable  constructor
x = Variable(data_gpu)
l  Most  functions  support  GPU  Variables
–  Parameterized  functions  must  be  sent  to  GPU  beforehand  by  
Function.to_gpu  or  FunctionSet.to_gpu
l  Extracts  the  results  to  host  memory  by  chainer.cuda.to_cpu
l  All  examples  support  CUDA  (pass  --gpu=N,  where  N  is  the  GPU  ID)
MLP  example  for  CUDA
# Model definition
model = FunctionSet(
l1=F.Linear(784, 100),
l2=F.Linear(100, 100),
l3=F.Linear(100, 10)).to_gpu()
opt = optimizers.SGD()
opt.setup(
model.collect_parameters())
# Forward computation
def forward(x, t):
h1 = F.relu(model.l1(x))
h2 = F.relu(model.l2(h1))
y = model.l3(h2)
return F.softmax_cross_entropy(y, t)
# Training loop
for epoch in xrange(n_epoch):
for i in xrange(0, N, batchsize):
x = Variable(to_gpu(...))
t = Variable(to_gpu(...))
opt.zero_grads()
loss = forward(x, t)
loss.backward()
opt.update()
CUDA  support  (3)
l  Chainer  also  supports  computation  on  multiple  GPUs  (easily!)
l  Model  parallel
–  Send  FunctionSets  to  appropriate  devices  (to_̲gpu  accepts  GPU  ID)
model_0 = FunctionSet(...).to_gpu(0)
model_1 = FunctionSet(...).to_gpu(1)
–  Copy  Variable  objects  across  GPUs  by  copy  function
x_1 = F.copy(x_0, 1)
u  This  copy  is  tracked  by  the  computational  graph,  so  you  donʼ’t  
need  to  deal  with  it  on  backprop
CUDA  support  (4)
l  Chainer  also  supports  computation  on  multiple  GPUs
l  Data  parallel
–  FunctionSet  can  be  copied  by  copy.copy
model = FunctionSet(...)
model_0 = copy.copy(model_0).to_gpu(0)
model_1 = model_1.to_gpu(1)
–  Set  up  the  optimizer  only  for  the  master  model
opt.setup(model_0.collect_parameters())
–  After  data-‐‑‒parallel  gradient  computation,  gather  them
opt.accumulate_grads(model_1.gradients)
–  After  the  update,  share  them  across  model  copies
model_1.copy_parameters_from(model_0.parameters)
Model  Zoo  support  (in  the  near  future)
l  Model  Zoo  is  a  place  that  pretrained  models  are  registered
–  Provided  by  BVLC  Caffe  team
–  It  contains  the  Caffe  reference  models
l  We  are  planning  to  support  the  Caffe  reference  models  in  three  weeks  
(the  next  minor  release)
–  Current  design  (it  may  be  changed):
f = CaffeFunction(‘path/to/model.caffemodel’)
x, t = Variable(...), Variable(...)
y = f(inputs={‘data’: x, ‘label’: t}, outputs=[‘loss’])
–  It  emulates  Caffe  networks  by  Chainerʼ’s  functions
Note:  development  process
l  Schedule
–  We  are  planning  to  release  updates  biweekly
–  Updates  are  classified  into  three  groups
u  Revision:  bug  fixes,  updates  without  adding/modifying  interfaces
u  Minor:  Updates  that  add/modify  interfaces  without  lacking  
backward  compatibility
u  Major:  Updates  that  are  not  backward-‐‑‒compatible
l  We  are  using  the  GitHub-‐‑‒flow  process
l  We  welcome  your  PRs!
–  Please  send  them  to  the  master  branch
Wrap  up
l  Chainer  is  a  powerful,  flexible,  and  intuitive  framework  of  neural  
networks  in  Python
l  It  is  based  on  Define-‐‑‒by-‐‑‒Run  scheme,  which  makes  it  intuitive  and  
flexible
l  Chainer  is  a  very  young  project  and  immature
–  Its  development  started  at  mid.  April  (just  two  months  ago)
–  We  will  add  many  functionailities  (especially  more  functions)
–  We  may  add  some  abstraction  of  whole  learning  processes
1 of 36

Recommended

Chainer v2 alpha by
Chainer v2 alphaChainer v2 alpha
Chainer v2 alphaSeiya Tokui
13.1K views15 slides
Chainer Update v1.8.0 -> v1.10.0+ by
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
52.3K views17 slides
Overview of Chainer and Its Features by
Overview of Chainer and Its FeaturesOverview of Chainer and Its Features
Overview of Chainer and Its FeaturesSeiya Tokui
4.6K views21 slides
Differences of Deep Learning Frameworks by
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
5.3K views45 slides
GTC Japan 2016 Chainer feature introduction by
GTC Japan 2016 Chainer feature introductionGTC Japan 2016 Chainer feature introduction
GTC Japan 2016 Chainer feature introductionKenta Oono
734 views10 slides
Introduction to Chainer by
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerSeiya Tokui
4.3K views40 slides

More Related Content

What's hot

Introduction to Chainer 11 may,2018 by
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
304.9K views67 slides
Deep parking by
Deep parkingDeep parking
Deep parkingShintaro Shiba
88.8K views36 slides
Buzzwords Numba Presentation by
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentationkammeyer
1.5K views16 slides
Chainer v2 and future dev plan by
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev planSeiya Tokui
4.3K views19 slides
Numba: Flexible analytics written in Python with machine-code speeds and avo... by
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
2.5K views32 slides
Notes from 2016 bay area deep learning school by
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school Niketan Pansare
2.5K views65 slides

What's hot(20)

Introduction to Chainer 11 may,2018 by Preferred Networks
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
Preferred Networks304.9K views
Buzzwords Numba Presentation by kammeyer
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
kammeyer1.5K views
Chainer v2 and future dev plan by Seiya Tokui
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev plan
Seiya Tokui4.3K views
Numba: Flexible analytics written in Python with machine-code speeds and avo... by PyData
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData2.5K views
Notes from 2016 bay area deep learning school by Niketan Pansare
Notes from 2016 bay area deep learning school Notes from 2016 bay area deep learning school
Notes from 2016 bay area deep learning school
Niketan Pansare2.5K views
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ... by Intel® Software
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Intel® Software881 views
Introduction to Chainer by Shunta Saito
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
Shunta Saito167.8K views
Parallelization using open mp by ranjit banshpal
Parallelization using open mpParallelization using open mp
Parallelization using open mp
ranjit banshpal1.9K views
Numba: Array-oriented Python Compiler for NumPy by Travis Oliphant
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
Travis Oliphant15.5K views
CuPy: A NumPy-compatible Library for GPU by Shohei Hido
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
Shohei Hido7.7K views
Intro to OpenMP by jbp4444
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
jbp44444.4K views
Concurrent Programming OpenMP @ Distributed System Discussion by CherryBerry2
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry21.1K views

Viewers also liked

TensorFlow White Paperを読む by
TensorFlow White Paperを読むTensorFlow White Paperを読む
TensorFlow White Paperを読むYuta Kashino
79.2K views49 slides
OCRは古い技術 by
OCRは古い技術OCRは古い技術
OCRは古い技術Koji Kobayashi
11.9K views25 slides
文字認識はCNNで終わるのか? by
文字認識はCNNで終わるのか?文字認識はCNNで終わるのか?
文字認識はCNNで終わるのか?Seiichi Uchida
58.1K views40 slides
A Neural Attention Model for Sentence Summarization [Rush+2015] by
A Neural Attention Model for Sentence Summarization [Rush+2015]A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]Yuta Kikuchi
17.3K views46 slides
サルでもわかるディープラーニング入門 (2017年) (In Japanese) by
サルでもわかるディープラーニング入門 (2017年) (In Japanese)サルでもわかるディープラーニング入門 (2017年) (In Japanese)
サルでもわかるディープラーニング入門 (2017年) (In Japanese)Toshihiko Yamakami
56K views60 slides
最近のDeep Learning (NLP) 界隈におけるAttention事情 by
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi
72.3K views76 slides

Viewers also liked(7)

TensorFlow White Paperを読む by Yuta Kashino
TensorFlow White Paperを読むTensorFlow White Paperを読む
TensorFlow White Paperを読む
Yuta Kashino79.2K views
文字認識はCNNで終わるのか? by Seiichi Uchida
文字認識はCNNで終わるのか?文字認識はCNNで終わるのか?
文字認識はCNNで終わるのか?
Seiichi Uchida58.1K views
A Neural Attention Model for Sentence Summarization [Rush+2015] by Yuta Kikuchi
A Neural Attention Model for Sentence Summarization [Rush+2015]A Neural Attention Model for Sentence Summarization [Rush+2015]
A Neural Attention Model for Sentence Summarization [Rush+2015]
Yuta Kikuchi17.3K views
サルでもわかるディープラーニング入門 (2017年) (In Japanese) by Toshihiko Yamakami
サルでもわかるディープラーニング入門 (2017年) (In Japanese)サルでもわかるディープラーニング入門 (2017年) (In Japanese)
サルでもわかるディープラーニング入門 (2017年) (In Japanese)
Toshihiko Yamakami56K views
最近のDeep Learning (NLP) 界隈におけるAttention事情 by Yuta Kikuchi
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情
Yuta Kikuchi72.3K views
深層学習の非常に簡単な説明 by Seiichi Uchida
深層学習の非常に簡単な説明深層学習の非常に簡単な説明
深層学習の非常に簡単な説明
Seiichi Uchida45.7K views

Similar to Introduction to Chainer: A Flexible Framework for Deep Learning

NS3 Overview by
NS3 OverviewNS3 Overview
NS3 OverviewRahul Hada
4.5K views17 slides
Multicore by
MulticoreMulticore
MulticoreBirgit Plötzeneder
541 views39 slides
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021 by
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
362 views37 slides
Common Design of Deep Learning Frameworks by
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
1.4K views35 slides
HiPEAC 2019 Tutorial - Maestro RTOS by
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
127 views59 slides
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ... by
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Valeriy Kravchuk
202 views39 slides

Similar to Introduction to Chainer: A Flexible Framework for Deep Learning(20)

NS3 Overview by Rahul Hada
NS3 OverviewNS3 Overview
NS3 Overview
Rahul Hada4.5K views
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021 by Valeriy Kravchuk
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk362 views
Common Design of Deep Learning Frameworks by Kenta Oono
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
Kenta Oono1.4K views
HiPEAC 2019 Tutorial - Maestro RTOS by Tulipp. Eu
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu127 views
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ... by Valeriy Kravchuk
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Valeriy Kravchuk202 views
Tech Days 2015: ARM Programming with GNAT and Ada 2012 by AdaCore
Tech Days 2015: ARM Programming with GNAT and Ada 2012Tech Days 2015: ARM Programming with GNAT and Ada 2012
Tech Days 2015: ARM Programming with GNAT and Ada 2012
AdaCore1.9K views
Euro python2011 High Performance Python by Ian Ozsvald
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald3K views
carrow - Go bindings to Apache Arrow via C++-API by Yoni Davidson
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson452 views
Multithreaded_Programming_in_Python.pdf by giridharsripathi
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
giridharsripathi28 views
Guider: An Integrated Runtime Performance Analyzer on AGL by Peace Lee
Guider: An Integrated Runtime Performance Analyzer on AGLGuider: An Integrated Runtime Performance Analyzer on AGL
Guider: An Integrated Runtime Performance Analyzer on AGL
Peace Lee116 views
FOSDEM2018 Janus Lua plugin presentation by Lorenzo Miniero
FOSDEM2018 Janus Lua plugin presentationFOSDEM2018 Janus Lua plugin presentation
FOSDEM2018 Janus Lua plugin presentation
Lorenzo Miniero753 views
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures by Dr. Fabio Baruffa
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Dr. Fabio Baruffa374 views
[2016/2017] AADL (Architecture Analysis and Design Language) by Ivano Malavolta
[2016/2017] AADL (Architecture Analysis and Design Language)[2016/2017] AADL (Architecture Analysis and Design Language)
[2016/2017] AADL (Architecture Analysis and Design Language)
Ivano Malavolta972 views
TFLite NNAPI and GPU Delegates by Koan-Sin Tan
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
Koan-Sin Tan4.8K views
Machine learning Experiments report by AlmkdadAli
Machine learning Experiments report Machine learning Experiments report
Machine learning Experiments report
AlmkdadAli156 views

More from Seiya Tokui

Chainer/CuPy v5 and Future (Japanese) by
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Seiya Tokui
974 views22 slides
Learning stochastic neural networks with Chainer by
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
7.2K views30 slides
深層学習フレームワーク Chainer の開発と今後の展開 by
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開Seiya Tokui
26.9K views62 slides
論文紹介 Pixel Recurrent Neural Networks by
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
22.2K views21 slides
生成モデルの Deep Learning by
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
47.8K views38 slides
Chainer Development Plan 2015/12 by
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Seiya Tokui
91K views10 slides

More from Seiya Tokui(20)

Chainer/CuPy v5 and Future (Japanese) by Seiya Tokui
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)
Seiya Tokui974 views
Learning stochastic neural networks with Chainer by Seiya Tokui
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
Seiya Tokui7.2K views
深層学習フレームワーク Chainer の開発と今後の展開 by Seiya Tokui
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開
Seiya Tokui26.9K views
論文紹介 Pixel Recurrent Neural Networks by Seiya Tokui
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
Seiya Tokui22.2K views
生成モデルの Deep Learning by Seiya Tokui
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
Seiya Tokui47.8K views
Chainer Development Plan 2015/12 by Seiya Tokui
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
Seiya Tokui91K views
Deep Learningの基礎と応用 by Seiya Tokui
Deep Learningの基礎と応用Deep Learningの基礎と応用
Deep Learningの基礎と応用
Seiya Tokui34.4K views
Chainerの使い方と自然言語処理への応用 by Seiya Tokui
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用
Seiya Tokui53.2K views
論文紹介 Compressing Neural Networks with the Hashing Trick by Seiya Tokui
論文紹介 Compressing Neural Networks with the Hashing Trick論文紹介 Compressing Neural Networks with the Hashing Trick
論文紹介 Compressing Neural Networks with the Hashing Trick
Seiya Tokui6.3K views
深層学習フレームワークChainerの紹介とFPGAへの期待 by Seiya Tokui
深層学習フレームワークChainerの紹介とFPGAへの期待深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待
Seiya Tokui17.8K views
論文紹介 Semi-supervised Learning with Deep Generative Models by Seiya Tokui
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
Seiya Tokui150.6K views
Recurrent Neural Networks by Seiya Tokui
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Seiya Tokui46.9K views
Deep learning実装の基礎と実践 by Seiya Tokui
Deep learning実装の基礎と実践Deep learning実装の基礎と実践
Deep learning実装の基礎と実践
Seiya Tokui105.1K views
Deep Learning技術の今 by Seiya Tokui
Deep Learning技術の今Deep Learning技術の今
Deep Learning技術の今
Seiya Tokui142K views
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model by Seiya Tokui
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
Seiya Tokui18.4K views
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction by Seiya Tokui
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM PredictionICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
ICML2013読み会 Local Deep Kernel Learning for Efficient Non-linear SVM Prediction
Seiya Tokui8.7K views
Deep Learningの技術と未来 by Seiya Tokui
Deep Learningの技術と未来Deep Learningの技術と未来
Deep Learningの技術と未来
Seiya Tokui45.5K views
rinko2011-agh by Seiya Tokui
rinko2011-aghrinko2011-agh
rinko2011-agh
Seiya Tokui12.5K views

Recently uploaded

Top-5-production-devconMunich-2023-v2.pptx by
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptxTier1 app
6 views42 slides
Sprint 226 by
Sprint 226Sprint 226
Sprint 226ManageIQ
11 views18 slides
Using Qt under LGPL-3.0 by
Using Qt under LGPL-3.0Using Qt under LGPL-3.0
Using Qt under LGPL-3.0Burkhard Stubert
13 views11 slides
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...TomHalpin9
6 views29 slides
Dapr Unleashed: Accelerating Microservice Development by
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice DevelopmentMiroslav Janeski
13 views29 slides
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
16 views17 slides

Recently uploaded(20)

Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app6 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ11 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski13 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app11 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan6 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert29 views
Fleet Management Software in India by Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable12 views
Navigating container technology for enhanced security by Niklas Saari by Metosin Oy
Navigating container technology for enhanced security by Niklas SaariNavigating container technology for enhanced security by Niklas Saari
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy14 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik8 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app9 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 5 views

Introduction to Chainer: A Flexible Framework for Deep Learning

  • 1. Introduction  to  Chainer: A  Flexible  Framework  for  Deep  Learning 2015-‐‑‒06-‐‑‒18  PFI/PFN  Weekly  Seminar Seiya  Tokui  (Preferred  Networks)
  • 2. Self-‐‑‒Introduction l  Seiya  Tokui    @beam2d  (Twitter,  GitHub) l  Researcher  at  Preferred  Networks l  Main  focus:  machine  learning –  Learning  to  Hash  (master  degree) –  Deep  Learning,  Representation  Learning  (current  focus) 2
  • 3. 3 A Powerful, Flexible, and Intuitive Framework of Neural Networks
  • 4. Today  I  will  introduce: l  The  features  of  Chainer l  How  to  use  Chainer l  Some  planned  features l  (Slide  in  English,  talk  in  Japanese)
  • 6. Chainer  is  a  framework  of  neural  networks l  Official  site:  http://chainer.org   l  Repository:  https://github.com/pfnet/chainer l  Provided  as  a  Python  library  (PyPI:  chainer) l  Main  features –  Powerful:Supports  CUDA  and  multi-‐‑‒GPU  capability –  Flexible: Support  almost  arbitrary  architectures –  Intuitive: Forward  prop  can  be  written  as  a  regular  Python  code
  • 7. Elements  of  a  neural  network  framework l  Multi-‐‑‒dimensional  array  implementations l  Layer  implementations –  Called  in  various  names  (layers,  modules,  blocks,  primitives,  etc...) –  The  smallest  units  of  automatic  differentiation –  Contain  forward  and  backward  implementations l  Optimizer  implementations l  Other  stuffs  (data  loading  scheme,  training  loop,  etc...) –  These  are  also  very  important,  though  Chainer  currently  does  not   provide  their  abstraction  (future  work) 7
  • 8. Forward  prop  /  Backprop l  Forward  prop  is  how  we  want  to  process  the  input  data l  Backprop  computes  its  gradient  for  the  learnable  parameters l  Given  backward  procedures  of  all  layers,  backprop  can  be  written  as   their  combination  (a.k.a.  reverse-‐‑‒mode  automatic  differentiation) 8 input hidden output groundtruth loss  func gradgradgrad hidden
  • 9. Backprop  Implementation  Paradigm  (1) Define-‐‑‒and-‐‑‒Run l  First,  a  computational  graph  is  constructed.  Then,  it  is  periodically  fed   with  minibatches  to  do  forward/backward l  The  computational  graph  can  be  seen  as  a  program  and  the  forward/ backward  computation  is  done  by  its  interpreter u  Caffe:  the  program  is  written  by  Prototxt u  Torch:  the  program  is  constructed  by  Lua  scripts u  Theano-‐‑‒based  frameworks:  the  program  is  constructed  by  Python   scripts
  • 10. Backprop  Implementation  Paradigm  (2) Define-‐‑‒and-‐‑‒Run  (cont.) l  Pros –  (Almost)  No  need  of  memory  management –  The  computational  graph  can  be  implicitly  optimized  (cf.  Theano) l  Cons –  The  program  is  fixed  within  the  training  loop –  The  interpreter  must  have  capability  of  defining  various  forward   computations,  including  control-‐‑‒flow  statements  like  if  and  for u  Theano  has  the  dedicated  functions  for  them  (ifelse  and  scan),   which  are  unintuitive  and  not  Pythonic –  Network  definition  is  hard  to  debug,  since  an  error  occurs  at  the   forward  computation  that  is  far  apart  from  the  network  definition
  • 11. Backprop  Implementation  Paradigm  (3) Define-‐‑‒by-‐‑‒Run l  The  forward  computation  is  written  as  a  regular  program  code  with   special  variables  and  operators,  executing  which  simultaneously  involves   the  forward  computation  and  the  graph  construction  (just  by  storing  the   order  of  operations). l  The  graph  is  used  for  the  backward  computation. l  This  paradigm  enables  us  to  use  arbitrary  control  flow  statements  in  the   forward  computation –  No  need  of  a  mini  language  and  its  interpreter l  It  also  makes  the  forward  computation  intuitive  and  easy  to  debug
  • 12. Backprop  Implementation  Paradigm  (4) Define-‐‑‒by-‐‑‒Run  (cont.) l  The  computational  graph  can  be  modified  within  each  iteration l  Example:  Truncated  BPTT  (BackProp  Through  Time) –  BPTT:  Backprop  on  a  recurrent  net –  Truncated  BPTT:  Truncate  the  backprop  at  some  time  point –  Truncation  is  one  type  of  modification  of  the  computational  graph Truncated
  • 13. Features  of  Chainer l  Define-‐‑‒by-‐‑‒Run  scheme –  Forward  computation  can  contain  any  Python  code u  if-else,  for-else,  break,  continue,  try-except-finally,   list,  dict,  class,  etc... –  User  can  modify  the  graph  within  the  loop u  E.g.  truncation  can  be  done  by  unchain_̲backward  (which   unchains  the  graph  backward  from  some  variable) u  See  the  tutorial  on  recurrent  nets http://docs.chainer.org/en/latest/tutorial/recurrentnet.html l  Predefined  functions l  Support  GPU(s)  via  PyCUDA
  • 14. Example:  Training  a  multi-‐‑‒layer  perceptron  in  one  page Full  code  is  in  the  tutorial  and  the  example  directory. # Model definition model = FunctionSet( l1=F.Linear(784, 100), l2=F.Linear(100, 100), l3=F.Linear(100, 10)) opt = optimizers.SGD() opt.setup( model.collect_parameters()) # Forward computation def forward(x, t): h1 = F.relu(model.l1(x)) h2 = F.relu(model.l2(h1)) y = model.l3(h2) return F.softmax_cross_entropy(y, t) # Training loop for epoch in xrange(n_epoch): for i in xrange(0, N, batchsize): x = Variable(...) t = Variable(...) opt.zero_grads() loss = forward(x, t) loss.backward() opt.update()
  • 15. Example:  Recurrent  net  language  model  in  one  page Full  code  is  in  the  tutorial  and  the  example  directory. # Model definition model = FunctionSet( emb=F.EmbedID(1000, 100), x2h=F.Linear( 100, 50), h2h=F.Linear( 50, 50), h2y=F.Linear( 50, 1000)) opt = optimizers.SGD() opt.setup( model.collect_parameters()) # Forward computation of one step def fwd1step(h, w, t): x = F.tanh(model.emb(w)) h = F.tanh(model.x2h(x) + model.h2h(h)) y = model.h2y(h) return h, F.softmax_cross_entropy(y, t) # Full RNN forward computation def forward(seq): h = Variable(...) # init state loss = 0 for curw, nextw in zip(seq, seq[1:]): x = Variable(curw) t = Variable(nextw) h, new_loss = fwd1step(h, x, t) loss += new_loss return loss
  • 16. : How to Use It 16
  • 17. Install  Chainer l  Prepare  a  Python  2.7  environment  with  pip –  (Pyenv+)Anaconda  is  recommended l  Install  Chainer  just  by pip install chainer l  If  you  want  to  use  GPU(s),  do: –  Install  CUDA  and  the  corresponding  NVIDIA  driver –  Install  dependent  packages  by pip install chainer-cuda-deps –  You  may  have  to  update  the  six package pip install –U six
  • 18. Run  the  MNIST  example  (quick  start) l  Require  scikit-‐‑‒learn  installed:  pip install scikits.learn l  Clone  the  repository  of  Chainer:   git clone https://github.com/pfnet/chainer l  Go  to  the  example  directory  at  examples/mnist l  Then,  run  python train_mnist.py –  Run  on  GPU  by  passing  --gpu=0 l  Other  examples  can  be  similarly  executed  (some  needs  manual   preparation  of  datasets)
  • 19. Read  the  documents l  Read  the  documents  at  http://docs.chainer.org l  It  includes: –  Tutorial –  Reference  manual l  All  features  given  in  this  talk  are  introduced  by  the  tutorial,  so  please  try   it  if  you  want  to  know  the  detail.
  • 20. Basic  concepts  (1) l  Essential  part  of  Chainer:  Variable  and  Function l  Variable  is  a  wrapper  of  n-‐‑‒dimensional  arrays  (ndarray  and  GPUArray) l  Function  is  an  operation  on  Variables –  Function  application  is  memorized  by  the  returned  Variable(s) –  All  operations  for  which  you  want  to  backprop  must  be  done  by   Functions  on  Variables l  Making  a  Variable  object  is  simple:  just  pass  an  array x = chainer.Variable(numpy.ndarray(...)) –  The  array  is  stored  in  data  attribute  (x.data)
  • 21. Basic  concepts  (2) l  Example  of  the  computational  graph  construction x = chainer.Variable(...) y = chainer.Variable(...) z = x**2 + 2*x*y + y l  Gradient  of  z(x,  y)  can  be  computed  by  z.backward() l  Results  are  stored  in  x.grad  and  y.grad x y _ ** 2 2 * _ _ * _ _ + _ z _ + _ Actually, Split nodes are automatically inserted (they accumulate the gradients on backprop)
  • 22. Basic  concepts  (3) l  Chainer  provides  many  functions  in  chainer.functions  subpackage –  This  package  is  often  abbreviated  to  F l  Parameterized  functions  are  provided  as  classes –  Linear,  Convolution2D,  EmbedID,  PReLU,  BatchNormalization,  etc. –  Their  instances  should  be  shared  across  all  iterations l  Non-‐‑‒parameterized  functions  are  provided  as  Python  functions –  Activation  functions,  pooling,  array  manipulation,  etc.
  • 23. Basic  concepts  (4) l  Use  FunctionSet  to  manage  parameterized  functions –  It  is  an  object  with  Function  attributes –  Easy  to  migrate  functions  onto  GPU  devices –  Easy  to  collect  parameters  and  gradients  (collect_̲parameters) l  Use  Optimizer  for  numerical  optimization –  Major  algorithms  are  provided: SGD,  MomentumSGD,  AdaGrad,  RMSprop,  ADADELTA,  Adam –  Some  parameter/gradient  manipulations  are  done  via  this  class: weight  decay,  gradient  clip,  
  • 24. Easy  to  debug! l  If  the  forward  computation  has  a  bug,  then  an  error  occurs  immediately   at  the  appropriate  line  of  the  forward  definition l  Example –  This  code  has  inconsistency  of  the  array  size: x = Variable(np.ndarray((3, 4), dtype=np.float32) y = Variable(np.ndarray((3, 3), dtype=np.float32) a = x ** 2 + x b = a + y * 2 c = b + x * 2 –  Since  an  exception  is  raised  at  the  appropriate  line,  we  can  easily  find   the  cause  of  bug  (this  is  one  big  difference  from  Define-‐‑‒and-‐‑‒Run   frameworks) ← an exception is raised at this line
  • 25. Graph  manipulation  (1) l  Backward  unchaining:  y.unchain_backward() –  It  purges  the  nodes  backward  from  y –  It  is  useful  to  implement  truncated  BPTT  (see  PTB  example) x f y g z y g z y.unchain_backward()
  • 26. Graph  manipulation  (2) l  Volatile  variables:  x = Variable(..., volatile=True) –  Volatile  variable  does  not  build  a  graph –  Volatility  can  be  accessed  directly  by  x.volatile x = Variable(..., volatile=True) y = f(x) y.volatile = False z = h(y) x f y g z
  • 27. Example:  Training  a  multi-‐‑‒layer  perceptron  in  one  page Note:  F = chainer.functions # Model definition model = FunctionSet( l1=F.Linear(784, 100), l2=F.Linear(100, 100), l3=F.Linear(100, 10)) opt = optimizers.SGD() opt.setup( model.collect_parameters()) # Forward computation def forward(x, t): h1 = F.relu(model.l1(x)) h2 = F.relu(model.l2(h1)) y = model.l3(h2) return F.softmax_cross_entropy(y, t) # Training loop for epoch in xrange(n_epoch): for i in xrange(0, N, batchsize): x = Variable(...) t = Variable(...) opt.zero_grads() loss = forward(x, t) loss.backward() opt.update()
  • 28. Example:  Recurrent  net  language  model  in  one  page # Model definition model = FunctionSet( emb=F.EmbedID(1000, 100), x2h=F.Linear( 100, 50), h2h=F.Linear( 50, 50), h2y=F.Linear( 50, 1000)) opt = optimizers.SGD() opt.setup( model.collect_parameters()) # Forward computation of one step def fwd1step(h, w, t): x = F.tanh(model.emb(w)) h = F.tanh(model.x2h(x) + model.h2h(h)) y = model.h2y(h) return h, F.softmax_cross_entropy(y, t) # Full RNN forward computation def forward(seq): h = Variable(...) # init state loss = 0 for curw, nextw in zip(seq, seq[1:]): x = Variable(curw) t = Variable(nextw) h, new_loss = fwd1step(h, x, t) loss += new_loss return loss
  • 29. CUDA  support  (1) l  Chainer  supports  CUDA  computation l  Installation –  Install  CUDA  6.5+ –  Install  CUDA-‐‑‒related  packages  by pip install chainer-cuda-deps u  Build  of  PyCUDA  may  fail  if  you  install  CUDA  into  non-‐‑‒standard   path.  In  such  case,  you  have  to  install  PyCUDA  from  source  code   with  appropriate  configuration.
  • 30. CUDA  support  (2) l  Call  cuda.init() before  any  CUDA-‐‑‒related  operations l  Converts  numpy.ndarray  into  GPUArray  by  chainer.cuda.to_gpu data_gpu = chainer.cuda.to_gpu(data_cpu) l  A  GPUArray  object  can  be  passed  to  the  Variable  constructor x = Variable(data_gpu) l  Most  functions  support  GPU  Variables –  Parameterized  functions  must  be  sent  to  GPU  beforehand  by   Function.to_gpu  or  FunctionSet.to_gpu l  Extracts  the  results  to  host  memory  by  chainer.cuda.to_cpu l  All  examples  support  CUDA  (pass  --gpu=N,  where  N  is  the  GPU  ID)
  • 31. MLP  example  for  CUDA # Model definition model = FunctionSet( l1=F.Linear(784, 100), l2=F.Linear(100, 100), l3=F.Linear(100, 10)).to_gpu() opt = optimizers.SGD() opt.setup( model.collect_parameters()) # Forward computation def forward(x, t): h1 = F.relu(model.l1(x)) h2 = F.relu(model.l2(h1)) y = model.l3(h2) return F.softmax_cross_entropy(y, t) # Training loop for epoch in xrange(n_epoch): for i in xrange(0, N, batchsize): x = Variable(to_gpu(...)) t = Variable(to_gpu(...)) opt.zero_grads() loss = forward(x, t) loss.backward() opt.update()
  • 32. CUDA  support  (3) l  Chainer  also  supports  computation  on  multiple  GPUs  (easily!) l  Model  parallel –  Send  FunctionSets  to  appropriate  devices  (to_̲gpu  accepts  GPU  ID) model_0 = FunctionSet(...).to_gpu(0) model_1 = FunctionSet(...).to_gpu(1) –  Copy  Variable  objects  across  GPUs  by  copy  function x_1 = F.copy(x_0, 1) u  This  copy  is  tracked  by  the  computational  graph,  so  you  donʼ’t   need  to  deal  with  it  on  backprop
  • 33. CUDA  support  (4) l  Chainer  also  supports  computation  on  multiple  GPUs l  Data  parallel –  FunctionSet  can  be  copied  by  copy.copy model = FunctionSet(...) model_0 = copy.copy(model_0).to_gpu(0) model_1 = model_1.to_gpu(1) –  Set  up  the  optimizer  only  for  the  master  model opt.setup(model_0.collect_parameters()) –  After  data-‐‑‒parallel  gradient  computation,  gather  them opt.accumulate_grads(model_1.gradients) –  After  the  update,  share  them  across  model  copies model_1.copy_parameters_from(model_0.parameters)
  • 34. Model  Zoo  support  (in  the  near  future) l  Model  Zoo  is  a  place  that  pretrained  models  are  registered –  Provided  by  BVLC  Caffe  team –  It  contains  the  Caffe  reference  models l  We  are  planning  to  support  the  Caffe  reference  models  in  three  weeks   (the  next  minor  release) –  Current  design  (it  may  be  changed): f = CaffeFunction(‘path/to/model.caffemodel’) x, t = Variable(...), Variable(...) y = f(inputs={‘data’: x, ‘label’: t}, outputs=[‘loss’]) –  It  emulates  Caffe  networks  by  Chainerʼ’s  functions
  • 35. Note:  development  process l  Schedule –  We  are  planning  to  release  updates  biweekly –  Updates  are  classified  into  three  groups u  Revision:  bug  fixes,  updates  without  adding/modifying  interfaces u  Minor:  Updates  that  add/modify  interfaces  without  lacking   backward  compatibility u  Major:  Updates  that  are  not  backward-‐‑‒compatible l  We  are  using  the  GitHub-‐‑‒flow  process l  We  welcome  your  PRs! –  Please  send  them  to  the  master  branch
  • 36. Wrap  up l  Chainer  is  a  powerful,  flexible,  and  intuitive  framework  of  neural   networks  in  Python l  It  is  based  on  Define-‐‑‒by-‐‑‒Run  scheme,  which  makes  it  intuitive  and   flexible l  Chainer  is  a  very  young  project  and  immature –  Its  development  started  at  mid.  April  (just  two  months  ago) –  We  will  add  many  functionailities  (especially  more  functions) –  We  may  add  some  abstraction  of  whole  learning  processes