How much position information do convolutional neural networks encode? review...Dongmin Choi
Similar to Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning (20)
Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning
1. Computational Properties of the Hippocampus
Increase the Efficiency of Goal-Directed Foraging
through Hierarchical Reinforcement Learning
Front. Computational Neurosci.
Chalmers et al.
2016
3. Model-based Reward Learning Algorithm
▪ ‘Model-based’ means, uses representations of the environment,
expectations and prospective calculations to make cognitive
predictions of future value [1]
▪ Previous artificial intelligence (AI) research suggest the model-
based reward learning (MBRL) learning algorithm to
understand how an animal navigation to the goal (reward)
s: state
a: action
R: reward function
Q-learning algorithm
Next action Current actionTransition probability
[1] Dayan and Berridge, 2014, Cogn Affect Behav Neurosci.
4. Goal-directed Navigation based on MBRL
▪ ‘Tree search’ model for goal-directed navigation model based on MBRL
▪ A rat faces a maze, in which different turns lead to states and rewards
[2] Daw, 2012, IEEE
5. Place Cell and Reward
▪ The location specific spatial information is processed in the brain,
‘place cell’
▪ Place cell has ‘forward sweeps’ (journey-dependent activity) that is
related to the specific route for goal-directed navigation
[3] Grieves et al., 2016, eLIFE
6. Place Cell and Reward
▪ Dopamine is known to modulate the synaptic plasticity
▪ Hippocampus that has place cell receives dopaminergic synaptic
input from ventral tegmental area (VTA)
▪ VTA is the origin of the dopaminergic cell bodies and plays a major
role in reward circuitry of the brain
[4] Elmann and Lessmann, 2013, Front. Neurosci. [5] Russo and Nestler, 2013, Nat. Rev. Neurosci.
7. Hierarchical Navigation Model
▪ Hierarchical cognitive map model
▫ Learning intelligent distribution agent (LIDA) has three phase that
similar to MBRL
• Understanding
◦ Sensing environment
◦ Detecting feature
◦ Recognizing object and category
• Attending
• Action
[6] Madl et al., 2015, Neural Networks
8. Hierarchical Representation
▪ Place cell processes the spatial information of an animal as
hierarchical space
[7] Keinath et al., 2014, Hippocampus [8] Lyttle et al., 2013, Hippocampus
Standard environment
(diam = 35 cm)
Different
visual cue
Large environment
(diam = 70 cm)
9. Hierarchical Spatial Information Processing
▪ Animals with lesions of ventral hippocampus (vH) suffer
delayed acquisition, requiring more trials to learn the task
▪ Whereas animals with lesions of dorsal hippocampus (dH)
never learn to perform the water maze task as well as intact
animals
[9] Ruediger et al., 2012, Nat. Neurosci.
10. Research Question
▪ Conventional MBRL algorithms do not fully explain animals’
ability to rapidly adapt to environmental change, or learn
multiple complex tasks
▪ To implement a computational MBRL frameworks that
incorporates features inspired by computational properties of
the hippocampus
▫ Hierarchical representation of space
▫ Forward sweeps
11. Method
▪ Simulated environment
▫ 16 by 48 discrete states in an environment
▫ Four actions (up, down, left and right)
▫ Open arena 10 trials → boundaries were added to the arena (5 mazes)
▫ Agent could not see or otherwise sense barriers or rewards
▫ Goal location triggers a ‘reward’ update
12. Method
▪ Computational model of place cell and reinforcement learning
Goal
Start
Abstracted spatial hierarchy (6 levels) MBRL for forward sweeps
Hierarchy planning process
13. Hippocampal Lesion Simulation
▪ The agent with simulated vH lesion learns the task more slowly
than the unimpaired agent
▪ While the agent with simulated dH lesion had worse asymptotic
performance
14. Efficient Spatial Navigation and Adaptation
▪ Open arena 10 trials → maze
▪ The hierarchical approach
adapted more quickly and
used far less computational
resources than the standard
MBRL algorithm
15. Trapped by Learning
▪ Non-hierarchical approach is slow to adapt to the added
boundaries
▪ Conventional MBRL algorithm requires many steps and much
computation to adapt
Occupancy density for trajectories
16. Probabilistic Reward Locations
▪ The reward was placed randomly in each trials at one of four
locations in an environment (7 x 7 states)
▪ The conventional MBRL algorithm could not solve this task
17. Conclusion
▪ The hierarchical representation of place cell enhances the goal-
directed navigation efficiency even in a novel environment
▪ The hippocampus hierarchical representation of space can
support a computationally efficient learning and adaptation that
may not be possible otherwise
18. Discussion
▪ The problems of previous research on MBRL models
▪ Previous goal-directed navigational model based on
neuromorphic neural network represent an environment as
non-hierarchical space by place cells
▪ MBRL learning paradigm with place cell can adapt to
neuromorphic neural network model based on place cell
▫ State: sensory information → spike of place cell
▫ Action and probability → STDP learning rule between place cells
19. Discussion
▪ Authors supposed place cell activity is came from grid cell
activity and this model regard the place cell will be remapped by
new rewards
Global re-mapping in PC Grid structure of GC in novel place