Neural_Programmer_Interpreter

Neural Programmer-
Interpreters
ICLR 2016
Best Paper Award
Scott Reed & Nando de Freitas 
Google DeepMind
citation: 19 
London, UK
Katy, 2016/10/14

Motivation
• ML is ultimately about automating tasks, hoping
that machine can do everything for human
• For example, I want the machine to make a cup of
coffee for me

Motivation
• Ancient way: is to write full highly-detailed program
speciﬁcations to carry them out
• AI way: come up with a lot of training examples that
capture the variability in the real world, and then
train some general learning machine on this large
data set.

Motivation
• but sometimes the dataset is not big enough! and it
doesn’t generalize well..
• NPI is an attempt to use neural methods to train
machines to carry out simple tasks based on a
small amount of training data.

NPI Goals
• 1. Long-term prediction: Model potentially long sequences
of actions by exploiting compositional structure.
• 2. Continual learning: Learn new programs by composing
previously- learned programs, rather than from scratch.
• 3. Data efﬁciency: Learn generalizable programs from a
small number of example traces.
• 4. Interpretability: By looking at NPI’s generated
commands, we can understand what it is doing at multiple
levels of temporal abstraction.

Related Work
• Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le.
"Sequence to sequence learning with neural
networks." Advances in neural information
processing systems. 2014.
• Graves, Alex, Greg Wayne, and Ivo Danihelka.
"Neural turing machines." arXiv preprint arXiv:
1410.5401 (2014).

Sequence to sequence
learning with neural networks

Neural turing machines
http://cpmarkchang.logdown.com/posts/279710-neural-
network-neural-turing-machine

Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion

NPI core module
• The NPI core is a LSTM network that acts as a
router between programs conditioned on the
current state observation and previous hidden unit
states
• input: a learnable program embedding, program
arguments passed on by the calling program, and
a feature representation of the environment.
• output: a key indicating what program to call next,
arguments for the following program and a ﬂag
indicating whether the program should terminate.
core: an LSTM-based
sequence model

Car Rendering
• Whatever the starting position, the program should generate a
trajectory of actions that delivers the camera to the target view, e.g.
frontal pose at a 15◦ elevation.

How it Works
e: environment
a: program argument
p: embedded program vector
r(t): probability to terminate the current program

Adding Numbers
• Environment:
• Scratch pad with the two numbers to be added, a carry row
and output row.
• 4 read/write pointers location
• Program:
• LEFT, RIGHT programs that can move a carry pointer left or
right, respectively.
• WRITE program that writes a speciﬁed value to the location of
a speciﬁed pointer

Adding Numbers
Actual trace of addition program
generated by our model on the
problem shown to the left.

Adding Numbers
• all output actions (primitive atomic actions that can
be performed on the environment) are performed
with a single instruction – ACT.
all output actions (primitive atomic actions that can be
performed on the environment) are performed with a single
instruction – ACT.

Bubble Sort
• environment:
• Scratch pad with the array to
be sorted.
• Read/Write pointers

Car Rendering
• Environment:
• Rendering of the car
(pixels). (use CNN as feature
encoder)
• The current car pose is NOT
provided
• Target angle and elevation
coordinates.

core realized it haven’t done
with horizontal rotation

Experiments
• Data Efﬁciency
• Generalization
• Learning new programs with a ﬁxed NPI core

Data Efﬁciency - Sorting
• Seq2Seq LSTM and NPI
used the same number of
layers 
and hidden units.
• Trained on length 20 arrays
of single-digit numbers.
• NPI beneﬁts from mining
multiple subprogram
examples per sorting
instance
accuracy v.s. training example

Generalization - Sorting
• For each length 2 up
to 20, we provided 64
example bubble sort
traces, for a total of
1216 examples.
• Then, we evaluated
whether the network
can learn to sort arrays
beyond length 20

Generalization - Adding
only train on sequence length up to 20

Learning New Programs
with a Fixed NPI Core
• example task: ﬁnd the Max in array
• RJMP: move all pointers to the right by repeatedly
calling RSHIFT program
• MAX: call BUBBLESORT and then RJMP
• Expand program memory by adding 2 slots.
Randomly initialize, then learn by
backpropagation with the NPI core and all other
parameters ﬁxed.

• 1. Randomly initialize new program vectors in
memory
• 2. Freeze core and other program vectors
• 3. Backpropagate gradients to new program
vectors

• + Max: performance after addition of MAX program
to memory.  
• “unseen” uses a test set with disjoint car models
from the training set.

Conclusion(1/2)
• NPI is a RNN/LSTM-based sequence-to-sequence
translator with the ability to keep track of calling
programs while recurse into sub-program
• NPI generalizes well in comparison to sequence-to-
sequence LSTMs.
• A trained NPI with a ﬁx core can learn new task
while not forgetting about the old task

Conclusion(2/2)
• provide far fewer examples, but where the labels
contains richer information allowing the model to
learn compositional structure(It’s like sending kids
to school)

Further Discussion
• Can each task help each other during training?
• Can we share environment encoder?
• Any comments?
project page: http://www-personal.umich.edu/~reedscot/
iclr_project.html

Neural_Programmer_Interpreter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Neural_Programmer_Interpreter

Similar to Neural_Programmer_Interpreter (20)

Neural_Programmer_Interpreter