Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

# Hierarchical Reinforcement Learning

1,453 views

Published on

Presentation about Hierarchical Reinforcement Learning

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

### Hierarchical Reinforcement Learning

1. 1. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
2. 2. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
3. 3. Outline 1/2 Planning Process The Problem and Motivation Reinforcement Learning Markov Decision Process Q-Learning Hierarchical Reinforcement Learning Why HRL? Approaches 3
4. 4. Outline 2/2 Semi-Markov Decision Process Options Until Now Next Step - Simbad Limitations of HRL Future Work on HRL Questions References 4
5. 5. Planning Process 5
6. 6. The Problem and LEGO_Mindstorms_NXT_mini.jpg @ http:/ Motivation /lambcutlet.org/images/ LEGO MindStorms Robot with sensors, actuators and noise Purpose of collecting “bricks” and assembly them accordingly to a plan Decompose the global problem in sub- problems Try to solve the problem by implementing well-known RL and HRL techniques 6
7. 7. Reinforcement Learning Computational approach to learning @ R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). An agent tries to maximize the reward he receives when an action is taken Interacts with a complex, uncertain environment Learns how to map situations to actions 7
8. 8. Markov Decision Process A ﬁnite MDP is deﬁned by a ﬁnite set of states S a ﬁnite set of actions A @ http://en.wikipedia.org/wiki/Markov_decision_process 8
9. 9. Q-Learning [Watkins, C.J.C.H.’89] Agent with a state set S and action set A. Performs an action a in order to change its state. A reward is provided by the environment. The goal of the agent is to maximize its total reward. @ http://en.wikipedia.org/wiki/Q-learning 9
10. 10. Why HRL? Improve the performance Impossibility to apply RL to problems with large state/action (curse of dimensionality) Sub-goals and abstract actions can be used on different tasks (state abstraction) Multiple levels of temporal abstraction Obtain state abstraction 10
11. 11. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
12. 12. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
13. 13. Semi-Markov Decision Process An SMDP consists of A set of states S A set of actions A An expected cumulative discounted reward A well-deﬁned joint distribution of the next state and transit time 12
14. 14. Options [Sutton, Precup & Singh’99] An Option is deﬁned by A policy ∏: SxA ➞ [0,1] A termination condition β: S^+ →[0,1] And an initiation set I⊆S Its hierarchical and used to reach sub-goals 13
15. 15. Until Now O1 O2 14
16. 16. Until Now St Steps Steps Episodes Episodes @ Sutton, Precup & Singh’99 @ My Simulation 15
17. 17. Next Step - Simbad Java 3D Robot Simulator 3D visualization and sensing Range Sensor: sonars and IR Contact Sensor: bumpers @ http://simbad.sourceforge.net/ Will allow us to simulate and learn ﬁrst, and then transfer the learning to our LEGO MindStorm 16
18. 18. Limitations of HRL Effectiveness of these ideas on large and complex continuous control tasks Sub-goals are assigned manually Some of the existing algorithms only work well for the problem which they were designed to solve 17
19. 19. Future Work on HRL Automated discovery of state abstraction Find the best automated way to discovery sub-goals to associate with Options Obtain a long lived learning agent that faces a continued series of tasks and keep evolving 18
20. 20. Questions? 19
21. 21. Questions? 19
22. 22. References R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems: Proceedings of the 1997 Conference, Cambridge, MA, 1998. MIT Press. R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112:181–211, 1999. T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000. B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, 2002. 20