An option is temporally extended action with well defined policy.
Set of options ( O ) replaces the set of actions ( A )
Learning occurs outside options.
Learning over options ´ Semi MDP Q-Learning.
35.
Machine: Move e + Collision Avoidance Move e Choose Return End of hallway : End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
36.
Hierarchies of Abstract Machines [Parr, Russell’97]
A machine is a partial policy represented by a Finite State Automaton.
Learning occurs within machines, as machines are only partially defined.
Flatten all machines out and consider states [s,m] where s is a world state, and m, a machine node ´ MDP
reduce ( S o M ) : Consider only states where machine node is a choice node ´ Semi-MDP.
Learning ¼ Semi-MDP Q-Learning
39.
Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
" ... consider maze domains. Reinforcement learning researchers, including this author, have spent countless years of research solving a solved problem! Navigating in grid worlds, even with stochastic dynamics, has been far from rocket science since the advent of search techniques such as A*. ” -- David Andre
Use planners, theorem provers, etc. as components in big hierarchical solver.
Be the first to comment