Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hierarchical Reinforcement Learning

1,453 views

Published on

Presentation about Hierarchical Reinforcement Learning

Published in: Technology, Education
  • Be the first to comment

Hierarchical Reinforcement Learning

  1. 1. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
  2. 2. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
  3. 3. Outline 1/2 Planning Process The Problem and Motivation Reinforcement Learning Markov Decision Process Q-Learning Hierarchical Reinforcement Learning Why HRL? Approaches 3
  4. 4. Outline 2/2 Semi-Markov Decision Process Options Until Now Next Step - Simbad Limitations of HRL Future Work on HRL Questions References 4
  5. 5. Planning Process 5
  6. 6. The Problem and LEGO_Mindstorms_NXT_mini.jpg @ http:/ Motivation /lambcutlet.org/images/ LEGO MindStorms Robot with sensors, actuators and noise Purpose of collecting “bricks” and assembly them accordingly to a plan Decompose the global problem in sub- problems Try to solve the problem by implementing well-known RL and HRL techniques 6
  7. 7. Reinforcement Learning Computational approach to learning @ R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). An agent tries to maximize the reward he receives when an action is taken Interacts with a complex, uncertain environment Learns how to map situations to actions 7
  8. 8. Markov Decision Process A finite MDP is defined by a finite set of states S a finite set of actions A @ http://en.wikipedia.org/wiki/Markov_decision_process 8
  9. 9. Q-Learning [Watkins, C.J.C.H.’89] Agent with a state set S and action set A. Performs an action a in order to change its state. A reward is provided by the environment. The goal of the agent is to maximize its total reward. @ http://en.wikipedia.org/wiki/Q-learning 9
  10. 10. Why HRL? Improve the performance Impossibility to apply RL to problems with large state/action (curse of dimensionality) Sub-goals and abstract actions can be used on different tasks (state abstraction) Multiple levels of temporal abstraction Obtain state abstraction 10
  11. 11. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
  12. 12. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
  13. 13. Semi-Markov Decision Process An SMDP consists of A set of states S A set of actions A An expected cumulative discounted reward A well-defined joint distribution of the next state and transit time 12
  14. 14. Options [Sutton, Precup & Singh’99] An Option is defined by A policy ∏: SxA ➞ [0,1] A termination condition β: S^+ →[0,1] And an initiation set I⊆S Its hierarchical and used to reach sub-goals 13
  15. 15. Until Now O1 O2 14
  16. 16. Until Now St Steps Steps Episodes Episodes @ Sutton, Precup & Singh’99 @ My Simulation 15
  17. 17. Next Step - Simbad Java 3D Robot Simulator 3D visualization and sensing Range Sensor: sonars and IR Contact Sensor: bumpers @ http://simbad.sourceforge.net/ Will allow us to simulate and learn first, and then transfer the learning to our LEGO MindStorm 16
  18. 18. Limitations of HRL Effectiveness of these ideas on large and complex continuous control tasks Sub-goals are assigned manually Some of the existing algorithms only work well for the problem which they were designed to solve 17
  19. 19. Future Work on HRL Automated discovery of state abstraction Find the best automated way to discovery sub-goals to associate with Options Obtain a long lived learning agent that faces a continued series of tasks and keep evolving 18
  20. 20. Questions? 19
  21. 21. Questions? 19
  22. 22. References R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems: Proceedings of the 1997 Conference, Cambridge, MA, 1998. MIT Press. R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999. T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000. B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, 2002. 20

×