Value Iteration
Networks
Aviv Tamar, Sergey Levine, and Pieter Abbeel
Presenter: Sungjoon Choi
arXiv:1602.02867v1 [cs.AI] 9 Feb 2016
This paper can be used for
Convolutional Networks
Today, we will see a very clever interpretation of CNN !
CNN is not just used for efficient feature extractor but this
paper finds an analogy between operations in CNN and
value iteration algorithm in reinforcement learning.
Convolutional Networks
When it comes to an image processing, CNN is used
in almost Everywhere!
Structured Prediction?
Structured prediction is an umbrella term for supervised
machine learning techniques that involve predicting
structured objects, rather than scalar discrete or real values.
Path Planning?
Why not just End to End?
Is it Deep Q Learning?
No, it is different.
DQN only models the Q-function with CNN.
Reinforcement Learning
We only get the reward at certain points.
What makes RL different from other methods?
But we have to make decision every time.
RL: Value Iteration
So, we introduce the notion of value.
And of course, ways to find the value function.
Value Iteration via CNN?
This papers says
“ We introduce the value iteration network: a fully
differentiable neural network with a panning module
embedded within.”
Value Iteration via CNN?
Value Iteration Block
Value Iteration Block
The depth of the Q layer need not to be the same as the
number of actions.
Value Iteration Network
VI Block
Value Iteration Network
Or just a feature extraction stage. (I guess)
Hierarchical VI Network
Grid-World Experiment
Grid-World Experiment
Input: Sequence of states (locations)
Output: Sequence of actions (controls)
Grid-World Experiment
Value Iteration Network vs. Direct Policy Learning
Mars Rover Navigation
Conclusion
Very clever idea of using CNN as a building
block for solving inverse reinforcement
learning problem!
Make things differentiable and use deep
networks, deep learning tools will take care
of the rest.
Still conceptual level, but potentials are
limitless

Value iteration networks