Your SlideShare is downloading. ×
0
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Games with partial information
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Games with partial information

476

Published on

State of the art for games with partial information, (from a mathematical viewpoint) …

State of the art for games with partial information, (from a mathematical viewpoint)

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
476
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sequential decision making: decidability and complexityGames with partialobservation Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES projectTAO, Inria-Saclay IDF, Cnrs 8623,Lri, Univ. Paris-Sud,Digiteo Labs, PascalNetwork of Excellence.ParisSeptember 2012.
  • 2. A quite general model A directed graph (finite). A Starting point on the graph, a target (or several targets, with different rewards). I want to reach a target. Labels(=decisions) on edges: Next node = f( current node, decision) Each node is either: - random node (random decision). - decision node (I choose a decision) - opponent node (an opponent chooses)
  • 3. Partial observation Each decision node is equipped with an observation; you can make decisions using the list of past observations ==> you dont know where you are in the graph
  • 4. Overview● 10%: overview of Alternating Turing machine & computational complexity (great tool for complexity upper bounds)● 50%: general culture on games (including undecidability)● 35%: general culture on fictitious play (matrix games) (probably no time for this...)● 4%: my results on that stuff ==> 2 detailed proofs (one new) ==> feel free of interrupting
  • 5. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  • 6. Classical complexity classes P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
  • 7. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  • 8. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  • 9. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  • 10. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  • 11. Alternation
  • 12. Non-determinism & alternation
  • 13. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  • 14. Computational complexity: framework Uncertainty can be: – Adversarial: I focus on worst case – Stochastic: I focus on average result – Or both. “Stochastic = adversarial” if goal = 100% success. “Stochastic != adversarial” in the general case.
  • 15. Computational complexity: framework Many representations for problems. E.g.: – Succinct: a circuit computes the ith bit of the proba that action a leads to a transition from s to s – Compressed: a circuit computes many bits simultaneously – Flat: longer encoding (transition tables) ==> does not matter for decidability ==> matters for complexity
  • 16. Computational complexity: framework Many representations for problems. E.g.: – Succinct – Compressed – Flat Compressed representation “somehow” natural (state space has exponential size, transitions are fast): see e.g. Mundhenk for detailed defs and flat representations.
  • 17. Computational complexity: framework We use mainly compressed representation; see also Mundhenk for flat representations. Typically, exponentially small representations lead to exponentially higher complexity ==> but its not always the case... Simple things can change a lot the complexity: “superko”: rules forbid twice the same position; some fully observable 2Player games become EXPSPACE instead of EXP ==> discussed later
  • 18. Computational complexity: framework for first tables of results Either search (find a target) or optimize (cumulate rewards over time) Compressed (written with circuits or others...) or not (flat). Horizon: - Short horizon: horizon ≤ size of input - Long horizon: log2(horizon) ≤ size of input - Infinite horizon: no limit
  • 19. Mundhenks summary: one player, limited horizon: expected reward >0 ?
  • 20. Mundhenks summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): easier
  • 21. Complexity, partial observation, infinite horizon, proba of reaching a target● 1P+random, unobservable: undecidable (Madani et al)● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP [rintanen, 2003] Rmk: “2P, P(win=1)” is not “2P”!
  • 22. Complexity, partial observation, infinite horizon● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
  • 23. Complexity, partial observation Remarks:● Continuous case ?● Purely epistemic (we gather information, we dont change the state) ? [Sabbadin et al]● Restrictions on the policy, on the set of actions...● Discounted reward● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
  • 24. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution All have their PO extensions but the two last are the most convenient in this case.
  • 25. Partially observable games Many tools for fully observable games. Not so many for partially observable ones.● Shi-Fu-Mi (Rock Paper Scissor)● Card games● Phantom games
  • 26. Shi-Fu-Mi (Rock-Paper-Scissors)● Fully observable in simultaneous play, but partially observable in turn-based version.● Computers stronger than humans (yes, its true).
  • 27. Card games, phantom games● Phantomized version of a game: – You dont see the move of your opponents – If you play an illegal move, you are informed that its illegal, you play again – Usually, you get a few more information (captures, threats...) <== game-dependent● Phantom-games: – phantom-Chess = Kriegspiel ==> Dark Chess: more info – phantom-Go – etc.
  • 28. Partially observable games● Usually quite heuristic algorithms● Best performing algorithms combine: – Opponent modelling (as for Shi-Fu-Mi) – Belief state (often by Monte-Carlo simulations) – Not a lot of tree search – A lot of tuning ==> usually no consistency analysis
  • 29. Part I: Complexity analysis(unbounded horizon) – Game: ● One or two players ● Win, loss, draw (incl. endless loop) – Partial observability, no random part – Finite state space: ● state=transition(state,action) ● action decided by each player in turn
  • 30. State of the art - makes sense in fully observable games - not so much in non-observable games
  • 31. State of the art EXPTIME-complete in the general fully-observable case
  • 32. EXPTIME-complete fullyobservable games - Chess (for some nxn generalization) - Go (with no superko) - Draughts (international or english) - Chinese checkers - Shogi
  • 33. PSPACE-complete fullyobservable games - Amazons - Hex polynomial horizon - Go-moku + - Connect-6 full observation - Qubic ==> PSPACE - Reversi - Tic-Tac-Toe Many games with filling of each cell once and only once
  • 34. EXPSPACE-completeunobservable games (Hasslun & Jonnsson) The two-player unobservable case is EXPSPACE-complete (games in succinct form, infinite horizon). (still for 100%win “UD” criterion - for not fully observable cases it is necessary to be precise...)Importantly, the UD criterion means that strategies are the same if the opponent has full observation as if he has no observation ==> UD is very bad :-(
  • 35. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 36. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (exponential list of actions is enough...) (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 37. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 38. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ? Consider a TM with tape of size N=2^n. We must find a game - with size n ( n= log2(N) ) - such that the first player has a winning strategy for player 1 iff the TM halts.
  • 39. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) .....................................
  • 40. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4) Wins byfinal state !
  • 41. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an x(3,1),x(3,2),x(3,3),x(3,4) illegal transition! ..................................... ==> P2 can check the x(N,1), x(N,2), x(N,3), x(N,4) consistency of one 3-uple per line Wins by ==> requests space log(N)final state ! ( = position of the 3-uple)
  • 42. EXPSPACE-completeunobservable games The 1P+unknown initial state in the unobservable case is EXPSPACE-complete (games in succinct form). 2P+unobservable as well.
  • 43. 2EXPTIME-complete PO games The two-player PO case, or 1P+random PO is 2EXP-complete (games in succinct form). (2P = 1P+random because of UD)
  • 44. Undecidable games (B. Hearn) The three-player PO case is undecidable. (two players against one, not allowed to communicate)
  • 45. Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
  • 46. Another formalization c ==> much more satisfactory (might have drawbacks as well...)
  • 47. Madani et al. c 1 player + random = undecidable (even without opponent!)
  • 48. Madani et al. 1 player + random = undecidable. ==> answers a (related) question by Papadimitriou and Tsitsiklis. Proof ? Based on the emptiness problem for probabilistic finite automata (see Paz 71): Given a probabilistic finite automaton, is there a word accepted with proba at least c ? ==> undecidable
  • 49. Consequence for unobservablegames c 1 player + random = undecidable ==> 2 players = undecidable.
  • 50. Proof of “undecidability with 1 playeragainst random” ==> “undecidability with2 players” How to simulate 1 player + random with 2 players ?
  • 51. A random node to be rewritten
  • 52. A random node to be rewritten
  • 53. A random node to be rewritten Rewritten as follows: ● Player 1 chooses a in [[0,N-1]] ● Player 2 chooses b in [[0,N-1]] ● c=(a+b) modulo N ● Go to tcEach player can force the game to be equivalent tothe initial one (by playing uniformly)==> the proba of winning for player 1 (in case of perfect play) is the same as for for the initial game==> undecidability!
  • 54. Important remark Existence of a strategy for winning with proba 0.5 = also undecidable for the restriction to games in which the proba is >0.6 or <0.4 ==> not just a subtle precision trouble.
  • 55. So what ? We have seen that unbounded horizon + partial observability + natural criterion (not sure win) ==> undecidability contrarily to what is expected from usual definitions. What about bounded horizon, 2P ? – Clearly decidable – Complexity ? – Algorithms ? (==> coevolution & LP)
  • 56. Complexity (2P, 0-sum, no random) Unbounded Exponential Polynomial horizon horizon horizonFullObservability EXP EXP PSPACENo obs EXPSPACE NEXP(X=100%) (Hasslum et al, 2000)Partially 2EXP EXPSPACEObservable (Rintanen) (Mundhenk)(X=100%)Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXPNo obs undecidable <=2EXP (PL) <=EXP (PL) (concise matrix games)Partially undecidable <= 2EXP (PL) <= EXP (PL)Observable
  • 57. Part II: Fictitious play (boundedhorizon) in the antagonist case Fictitious play ? Somehow an abstract version of antagonist coevolution with full memory● illimited population (finite, but increasing): one more indiv. per iteration● perfect choice of each mutation against the current population of opponents
  • 58. Part II: Fictitious play in thezero-sum case Why zero-sum cases ? Evolutionary stable solutions (found by FP) are usually sub-optimal (as well as nature, for choosing lions strategies or cheating behaviors in Scaly- breasted Munia)
  • 59. What is a matrix 0-sum game ?● A matrix M is given (type n x m).● Player 1 chooses (privately) i in [[1,n]]● Player 2 chooses j in [[1,n]]● Reward = Mij for player 1 = -Mij for player 2 (zero-sum game) ==> Model for finite antagonist games
  • 60. Nash equilibrium● Nash equilibrium: there is a distribution of probability for each player (= mixed strategy) such that the reward is optimum (for the worst case on the distribution of probabilities by the opponent)● Linear programming is a polynomial algorithm for finding the Nash eq.● FP= tool for approximating it (at least in 0-sum cases)
  • 61. Fictitious play (Brown 1949)● Each player starts with a distribution on its strategies● Each player in turn: – Finds an optimal strategy against the current opponents distribution (randomly break ties) – Adds it to its distribution (the distrib. does not sum to 1!)
  • 62. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  • 63. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  • 64. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  • 65. Rock-paper-scissor● Rock:1, Papers=0, Scissors:0● RPS1=(1,0,0) RPS2=(1,0,0)● RPS1=(1,1,0) RPS2=(1,1,0)● RPS1=(1,2,0) RPS2=(1,1,1)● RPS1=(1,3,0) RPS2=(1,1,2)● RPS1=(2,3,0) RPS2=(1,2,2)● … ===> converges to Nash (Robinson 51)
  • 66. Fictitious play TODO
  • 67. Improvements for KxK matrixgame: approximations● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 68. Improvements for KxK matrixgame: exact solution if k-sparse● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 69. Improvements for KxK matrixgame: approximations● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP● Exact solution in time (Auger, Ruette, Teytaud) O (K log K · k 2k + poly(k) ) if solution k-sparse (good only if k smaller than log(K)/log(log(K)) ! better ?)
  • 70. Improvements for KxK matrixgame: approximations So, LP & FP are two tools for matrix games. LP programming can be adapted to PO games without building the complete matrix (using information sets). The same for FP variants ?
  • 71. Conclusions There are still natural questions which provide nice decidability problems Madani et al (1 player against random, no observability), extended here to 2 players with no random ==> undecidable problems “less than” the Halting problem ? Solving zero-sum matrix-games is still an active area of research ● Approximate cases ● Sparse case
  • 72. Open problems● Phantom-Go undecidable ? (or other “real” game...)● Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE)● More to say about “epistemic” games (internal state not modified)● Frontier of undecidability in PO games ? (100% halting game: 2P become decidable)● Chess with finitely many pieces on infinite board: decidability of forced-mate ? (n-move: Brumleve et al, 2012, simulation in Presburger (thanks S. Riis :-) )

×