Results from International Probabilistic Planning Competition IPPC 2011 @raimonbosch
Why Markov domains? (Crossing Traffic) (1) Solutions are functions (policies) mapping states into actions (2) Given an observation, stochastic behaviors can emerge.Missinginformation /Stochastic CANT PREDICT n+1 !!behavior !! We can obtain better rewards depending on a policy!!
IPPC 2011: DOMAINS AND EVALUATION • 8 domains– Traffic Control: highly exogenous, concurrent– Elevator Control: highly exogenous, concurrent– Game of Life: highly combinatoric– SysAdmin: highly exogenous, complex transitions– Navigation: goal-oriented, determinization killer– Crossing Traffic: goal-oriented, deterministic if move far left– Skill Teaching: few exogenous events– Reconnaissance: few exogenous events• Conditions– 24 hours for all runs– 10 instances per domain, 30 runs per instance
Changes from IPPC 2008- Not Goal Based.- Large branching factors.- Finite-horizon reward minimization.- More realistic planning scenarios.
Parts of UCT(1) Monte-Carlo Tree Search(2) Performs rollouts in a tree of decisionand chance nodes In decision nodes: * Choose any unvisited successorrandomly if there is one * Choose the successor maximizing theUCB1 policy otherwise
2nd MDP: GLUTTONLRTDP with reverse iterative deepening• Subsampling transition function• Correlated transition function samples• Caching
POMDP Track: Challenges- Agent acting under uncertainty.- Stochastic sequential decision problems.- Very large number of states.- Compact representation needed.
1st POMDP: SARSOPSuccessive Approximations of the Reachable Space under Optimal Policies- Solve POMDPs by sampling belief space.
2nd POMDP: KAIST-AILABUses symbolic heuristic search value iteration(symbolic HSVI) for factored POMDPs- Alpha vector masking method.- Algebraic decision diagram (ADD) representation.- Elimination of symmetric structures in the domains.
Thanks! T. Keller and P. Eyerich, “PROST: Probabilistic Planning Based on UCT,”ICAPS’12, 2012. A. Kolobov, P. Dai, M. Mausam, and D. S. Weld, “Reverse IterativeDeepening for Finite-Horizon MDPs with Large Branching Factors,” inTwenty-Second International Conference on Automated Planning andScheduling, 2012. H. Kurniawati, D. Hsu, and W. S. Lee, “SARSOP: Efficient point-basedPOMDP planning by approximating optimally reachable belief spaces,” inProc. Robotics: Science and Systems, 2008, vol. 62. H. S. Sim, K. E. Kim, J. H. Kim, D. S. Chang, and M. W. Koo, “Symbolicheuristic search value iteration for factored POMDPs,” in Proc. Nat. Conf. onArtificial Intelligence, 2008, pp. 1088–1093.