Value iteration networks

Value Iteration
Networks
Aviv Tamar, Sergey Levine, and Pieter Abbeel
Presenter: Sungjoon Choi
arXiv:1602.02867v1 [cs.AI] 9 Feb 2016

Convolutional Networks
Today, we will see a very clever interpretation of CNN !
CNN is not just used for efficient feature extractor but this
paper finds an analogy between operations in CNN and
value iteration algorithm in reinforcement learning.

Convolutional Networks
When it comes to an image processing, CNN is used
in almost Everywhere!

Structured Prediction?
Structured prediction is an umbrella term for supervised
machine learning techniques that involve predicting
structured objects, rather than scalar discrete or real values.

Is it Deep Q Learning?
No, it is different.
DQN only models the Q-function with CNN.

Reinforcement Learning
We only get the reward at certain points.
What makes RL different from other methods?
But we have to make decision every time.

RL: Value Iteration
So, we introduce the notion of value.
And of course, ways to find the value function.

Value Iteration via CNN?
This papers says
“ We introduce the value iteration network: a fully
differentiable neural network with a panning module
embedded within.”

Value Iteration Block
The depth of the Q layer need not to be the same as the
number of actions.

Value Iteration Network
VI Block

Value Iteration Network
Or just a feature extraction stage. (I guess)

Grid-World Experiment
Input: Sequence of states (locations)
Output: Sequence of actions (controls)

Grid-World Experiment
Value Iteration Network vs. Direct Policy Learning

Conclusion
Very clever idea of using CNN as a building
block for solving inverse reinforcement
learning problem!
Make things differentiable and use deep
networks, deep learning tools will take care
of the rest.
Still conceptual level, but potentials are
limitless

Value iteration networks

More Related Content

What's hot

Viewers also liked

Similar to Value iteration networks

More from Sungjoon Choi

Recently uploaded

Value iteration networks