Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Results from International Probabilistic         Planning Competition              IPPC 2011            @raimonbosch
Why Markov domains? (Crossing Traffic)                (1) Solutions are functions (policies) mapping states into actions  ...
IPPC 2011: DOMAINS AND EVALUATION • 8 domains– Traffic Control: highly exogenous, concurrent– Elevator Control: highly exo...
Changes from IPPC 2008- Not Goal Based.- Large branching factors.- Finite-horizon reward minimization.- More realistic pla...
MDP winnersPROST(Eyerich, Keller – Uni. Freiburg)UCT/Single OutcomeDeterminization, CachingGlutton(Kolobov, Dai, Mausam, W...
POMDP winnersPOMDPX_NUS(Wu, WS Lee, D Hsu – NUS)SARSOP / UCT(POMCP)KAIST-AILAB(D Kim, K Lee, K-E Kim – KAIST)Symbolic HSVI...
Understanding UCT:Montecarlo tree search
Understanding UCT:Multi-armed bandit problem
UCT Algorithm by Kocsis andSzepesvari (2006)
Parts of UCT(1) Monte-Carlo Tree Search(2) Performs rollouts in a tree of decisionand chance nodes  In decision nodes:  * ...
1st MDP: PROSTDomain-independent probabilistic planningbased on UCT combined with additionaltechniques:- Reasonable Action...
2nd MDP: GLUTTONLRTDP with reverse iterative deepening• Subsampling transition function• Correlated transition function sa...
POMDP Track: Challenges- Agent acting under uncertainty.- Stochastic sequential decision problems.- Very large number of s...
1st POMDP: SARSOPSuccessive Approximations of the Reachable Space             under Optimal Policies- Solve POMDPs by samp...
2nd POMDP: KAIST-AILABUses symbolic heuristic search value iteration(symbolic HSVI) for factored POMDPs- Alpha vector mask...
Thanks!                                                                           [1]T. Keller and P. Eyerich, “PROST: Pro...
Upcoming SlideShare
Loading in …5
×

Results IPPC 2011: MDPS and POMDPS

858 views

Published on

  • Be the first to comment

  • Be the first to like this

Results IPPC 2011: MDPS and POMDPS

  1. 1. Results from International Probabilistic Planning Competition IPPC 2011 @raimonbosch
  2. 2. Why Markov domains? (Crossing Traffic) (1) Solutions are functions (policies) mapping states into actions (2) Given an observation, stochastic behaviors can emerge.Missinginformation /Stochastic CANT PREDICT n+1 !!behavior !! We can obtain better rewards depending on a policy!!
  3. 3. IPPC 2011: DOMAINS AND EVALUATION • 8 domains– Traffic Control: highly exogenous, concurrent– Elevator Control: highly exogenous, concurrent– Game of Life: highly combinatoric– SysAdmin: highly exogenous, complex transitions– Navigation: goal-oriented, determinization killer– Crossing Traffic: goal-oriented, deterministic if move far left– Skill Teaching: few exogenous events– Reconnaissance: few exogenous events• Conditions– 24 hours for all runs– 10 instances per domain, 30 runs per instance
  4. 4. Changes from IPPC 2008- Not Goal Based.- Large branching factors.- Finite-horizon reward minimization.- More realistic planning scenarios.
  5. 5. MDP winnersPROST(Eyerich, Keller – Uni. Freiburg)UCT/Single OutcomeDeterminization, CachingGlutton(Kolobov, Dai, Mausam, Weld – UW)Iterative DeepeningRTDP, Caching
  6. 6. POMDP winnersPOMDPX_NUS(Wu, WS Lee, D Hsu – NUS)SARSOP / UCT(POMCP)KAIST-AILAB(D Kim, K Lee, K-E Kim – KAIST)Symbolic HSVI (ADDs),Symmetry Detection
  7. 7. Understanding UCT:Montecarlo tree search
  8. 8. Understanding UCT:Multi-armed bandit problem
  9. 9. UCT Algorithm by Kocsis andSzepesvari (2006)
  10. 10. Parts of UCT(1) Monte-Carlo Tree Search(2) Performs rollouts in a tree of decisionand chance nodes In decision nodes: * Choose any unvisited successorrandomly if there is one * Choose the successor maximizing theUCB1 policy otherwise
  11. 11. 1st MDP: PROSTDomain-independent probabilistic planningbased on UCT combined with additionaltechniques:- Reasonable Action Pruning- Q-value initialization- Search Depth Limitation- Reward Lock Detection
  12. 12. 2nd MDP: GLUTTONLRTDP with reverse iterative deepening• Subsampling transition function• Correlated transition function samples• Caching
  13. 13. POMDP Track: Challenges- Agent acting under uncertainty.- Stochastic sequential decision problems.- Very large number of states.- Compact representation needed.
  14. 14. 1st POMDP: SARSOPSuccessive Approximations of the Reachable Space under Optimal Policies- Solve POMDPs by sampling belief space.
  15. 15. 2nd POMDP: KAIST-AILABUses symbolic heuristic search value iteration(symbolic HSVI) for factored POMDPs- Alpha vector masking method.- Algebraic decision diagram (ADD) representation.- Elimination of symmetric structures in the domains.
  16. 16. Thanks! [1]T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,”ICAPS’12, 2012. [2]A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse IterativeDeepening for Finite-Horizon MDPs with Large Branching Factors,” inTwenty-Second International Conference on Automated Planning andScheduling, 2012. [3]H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-basedPOMDP planning by approximating optimally reachable belief spaces,” inProc. Robotics: Science and Systems, 2008, vol. 62. [4]H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolicheuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. onArtificial Intelligence, 2008, pp. 1088–1093.

×