Sequential decision making: decidability and complexitySearching with partialobservation Olivier.Teytaud@inria.fr + too ma...
A quite general model        A directed graph (finite). A Starting point on the graph, a target (or  several targets, with...
Partial observation           Each decision node   is equipped with an observation;     you can make decisions using      ...
Overview●   10%: overview of Alternating Turing    machine & computational complexity                          (great tool...
Outline●   Complexity and ATM●   Complexity and games (incl. planning)●   Bounded horizon games
Classical complexity classes P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE       P ≠ EXPTIME N...
Complexity and alternating Turing machines●   Turing machine (TM)= abstract computer●   Non-deterministic Turing Machine (...
Complexity and alternating Turing machines●   Turing machine (TM)= abstract computer●   Non-deterministic Turing Machine (...
Complexity and alternating Turing machines●   Turing machine (TM)= abstract computer●   Non-deterministic Turing Machine (...
Complexity and alternating Turing machines●   Turing machine (TM)= abstract computer●   Non-deterministic Turing Machine (...
Alternation
Non-determinism & alternation
Outline●   Complexity and ATM●   Complexity and games (incl.    planning)●   Bounded horizon games
Computational complexity: framework Uncertainty can be:     –   Adversarial: I focus on worst case     –   Stochastic: I f...
Computational complexity: framework Many representations for problems. E.g.:    –   Succinct: a circuit computes the ith b...
Computational complexity: framework Many representations for problems. E.g.:    –   Succinct    –   Compressed    –   Flat...
Computational complexity: framework We use mainly compressed representation; see also Mundhenk for flat representations. T...
Computational complexity: framework for first tables of results  Either search (find a target)        or optimize (cumulat...
Mundhenks summary: one player, limited horizon: expected reward >0 ?
Mundhenks summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): ea...
Complexity, partial observation, infinite horizon, proba of reaching a target●   1P+random, unobservable: undecidable    (...
Complexity, partial observation, infinite horizon●   2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]●   2P (random or not)...
Complexity, partial observation    Remarks:●   Continuous case ?●   Purely epistemic (we gather information, we    dont ch...
What are the approaches ? –   Dynamic programming              (Massé – Bellman 50s) (still     the main approach in indus...
Partially observable games    Many tools for fully observable games.    Not so many for partially observable ones.●   Shi-...
Shi-Fu-Mi (Rock-Paper-Scissors)●   Fully observable in simultaneous play, but    partially observable in turn-based versio...
Card games, phantom games●   Phantomized version of a game:       –   You dont see the move of your opponents       –   If...
Partially observable games●   Usually quite heuristic algorithms●   Best performing algorithms combine:       –   Opponent...
Part I: Complexity analysis(unbounded horizon) –   Game:             ●   One or two players             ●   Win, loss, dra...
State of the art - makes sense in fully observable games - not so much in non-observable games
State of the art EXPTIME-complete in the general   fully-observable case
EXPTIME-complete fullyobservable games  - Chess (for some nxn generalization)  - Go (with no superko)  - Draughts (interna...
PSPACE-complete fullyobservable games    - Amazons    - Hex                        polynomial horizon    - Go-moku        ...
EXPSPACE-completeunobservable games                (Hasslun & Jonnsson)      The two-player unobservable case is      EXPS...
E X P S P Atwo-player unobservable case is      The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson)      EXPS...
E X P S P Atwo-player unobservable case is      The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson)      EXPS...
E X P S P Atwo-player unobservable case is      The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson)      EXPS...
E X P S P Atwo-player unobservable case is      The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson)      EXPS...
EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN)  n o b s e a b l e g a m e with a s l u n of n n ...
EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN)  n o b s e a b l e g a m e with a s l u n of n n ...
EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN)  n o b s e a b l e g a m e with a s l u n of n n ...
EXPSPACE-completeunobservable games     The 1P+unknown initial state in the     unobservable case is     EXPSPACE-complete...
2EXPTIME-complete PO games     The two-player PO case,       or 1P+random PO is     2EXP-complete     (games in succinct f...
Undecidable games               (B. Hearn)     The three-player PO case is     undecidable. (two players against one,     ...
Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
Another formalization                       c ==> much more satisfactory    (might have drawbacks as well...)
Madani et al.                        c  1 player + random = undecidable                    (even without opponent!)
Madani et al. 1 player + random = undecidable. ==> answers a (related) question by          Papadimitriou and Tsitsiklis. ...
Consequence for unobservablegames                       c 1 player + random = undecidable ==> 2 players = undecidable.
Proof of “undecidability with 1 playeragainst random” ==> “undecidability with2 players”   How to simulate 1 player + rand...
A random node to be rewritten
A random node to be rewritten
A random node to be rewritten     Rewritten as follows: ●   Player 1 chooses a in [[0,N-1]] ●   Player 2 chooses b in [[0,...
Important remark  Existence of a strategy for winning with  proba 0.5 = also undecidable for the  restriction to games in ...
So what ? We have seen that           unbounded horizon        + partial observability        + natural criterion (not sur...
Complexity (2P, 0-sum, no  random)                   Unbounded               Exponential    Polynomial                    ...
Part II: Fictitious play (boundedhorizon) in the antagonist case    Fictitious play ?    Somehow an abstract version of   ...
Part II: Fictitious play in thezero-sum case  Why zero-sum cases ?  Evolutionary stable solutions (found by  FP) are usual...
What is a matrix 0-sum game ?●   A matrix M is given (type n x m).●   Player 1 chooses (privately) i in [[1,n]]●   Player ...
Nash equilibrium●   Nash equilibrium: there is a distribution    of probability for each player             (= mixed strat...
Fictitious play                 (Brown 1949)●   Each player starts with a distribution on    its strategies●   Each player...
Matching penny    1 -1        (i.e. player 1 wins iff i=j)    -1 1●   HT1=(1,0)    HT2=(0,1)●   HT1=(1,1)    HT2=(0,2)●   ...
Matching penny    1 -1        (i.e. player 1 wins iff i=j)    -1 1●   HT1=(1,0)    HT2=(0,1)●   HT1=(1,1)    HT2=(0,2)●   ...
Matching penny    1 -1        (i.e. player 1 wins iff i=j)    -1 1●   HT1=(1,0)    HT2=(0,1)●   HT1=(1,1)    HT2=(0,2)●   ...
Rock-paper-scissor●   Rock:1, Papers=0, Scissors:0●   RPS1=(1,0,0)          RPS2=(1,0,0)●   RPS1=(1,1,0)          RPS2=(1,...
Fictitious play  TODO
Improvements for KxK matrixgame: approximations●   There exists  approximations in size    O(log(K)/2) [Althoefer]●   Su...
Improvements for KxK matrixgame: exact solution if k-sparse●   There exists  approximations in size    O(log(K)/2) [Alth...
Improvements for KxK matrixgame: approximations●   There exists  approximations in size    O(log(K)/2) [Althoefer]●   Su...
Improvements for KxK matrixgame: approximations So, LP & FP are two tools for matrix games. LP programming can be adapted ...
Conclusions  There are still natural questions which  provide nice decidability problems  Madani et al (1 player against r...
Open problems●   Phantom-Go undecidable ?             (or other “real” game...)●   Complexity of Go with Chinese rules ?  ...
Upcoming SlideShare
Loading in...5
×

Complexity of planning and games with partial information

202

Published on

Survey of computational complexity or computability of sequential decision making (games, planning)

contains two more detailed proofs:
- EXPSPACE completeness of unobservable adversarial planning for existence of 100% winning strategy (Hasslum et al)
- undecidability of unobservable adversarial planning for arbitrary winning rate (including optimal play in the Nash sense)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
202
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Complexity of planning and games with partial information

  1. 1. Sequential decision making: decidability and complexitySearching with partialobservation Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES projectTAO, Inria-Saclay IDF, Cnrs 8623,Lri, Univ. Paris-Sud,Digiteo Labs, PascalNetwork of Excellence.BielefeldSeptember 2012.
  2. 2. A quite general model A directed graph (finite). A Starting point on the graph, a target (or several targets, with different rewards). I want to reach a target. Labels(=decisions) on edges: Next node = f( current node, decision) Each node is either: - random node (random decision). - decision node (I choose a decision) - opponent node (an opponent chooses)
  3. 3. Partial observation Each decision node is equipped with an observation; you can make decisions using the list of past observations ==> you dont know where you are in the graph
  4. 4. Overview● 10%: overview of Alternating Turing machine & computational complexity (great tool for complexity upper bounds)● 50%: general culture on games (including undecidability)● 35%: general culture on fictitious play (matrix games) (probably no time for this...)● 4%: my results on that stuff ==> 2 detailed proofs (one new) ==> feel free of interrupting
  5. 5. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  6. 6. Classical complexity classes P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
  7. 7. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  8. 8. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts)● ATM: TM with both “exists” and “for all” states.
  9. 9. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  10. 10. Complexity and alternating Turing machines● Turing machine (TM)= abstract computer● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts)● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept)● ATM: TM with both “exists” and “for all” states.
  11. 11. Alternation
  12. 12. Non-determinism & alternation
  13. 13. Outline● Complexity and ATM● Complexity and games (incl. planning)● Bounded horizon games
  14. 14. Computational complexity: framework Uncertainty can be: – Adversarial: I focus on worst case – Stochastic: I focus on average result – Or both. “Stochastic = adversarial” if goal = 100% success. “Stochastic != adversarial” in the general case.
  15. 15. Computational complexity: framework Many representations for problems. E.g.: – Succinct: a circuit computes the ith bit of the proba that action a leads to a transition from s to s – Compressed: a circuit computes many bits simultaneously – Flat: longer encoding (transition tables) ==> does not matter for decidability ==> matters for complexity
  16. 16. Computational complexity: framework Many representations for problems. E.g.: – Succinct – Compressed – Flat Compressed representation “somehow” natural (state space has exponential size, transitions are fast): see e.g. Mundhenk for detailed defs and flat representations.
  17. 17. Computational complexity: framework We use mainly compressed representation; see also Mundhenk for flat representations. Typically, exponentially small representations lead to exponentially higher complexity ==> but its not always the case... Simple things can change a lot the complexity: “superko”: rules forbid twice the same position; some fully observable 2Player games become EXPSPACE instead of EXP ==> discussed later
  18. 18. Computational complexity: framework for first tables of results Either search (find a target) or optimize (cumulate rewards over time) Compressed (written with circuits or others...) or not (flat). Horizon: - Short horizon: horizon ≤ size of input - Long horizon: log2(horizon) ≤ size of input - Infinite horizon: no limit
  19. 19. Mundhenks summary: one player, limited horizon: expected reward >0 ?
  20. 20. Mundhenks summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): easier
  21. 21. Complexity, partial observation, infinite horizon, proba of reaching a target● 1P+random, unobservable: undecidable (Madani et al)● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP [rintanen, 2003] Rmk: “2P, P(win=1)” is not “2P”!
  22. 22. Complexity, partial observation, infinite horizon● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
  23. 23. Complexity, partial observation Remarks:● Continuous case ?● Purely epistemic (we gather information, we dont change the state) ? [Sabbadin et al]● Restrictions on the policy, on the set of actions...● Discounted reward● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
  24. 24. What are the approaches ? – Dynamic programming (Massé – Bellman 50s) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution All have their PO extensions but the two last are the most convenient in this case.
  25. 25. Partially observable games Many tools for fully observable games. Not so many for partially observable ones.● Shi-Fu-Mi (Rock Paper Scissor)● Card games● Phantom games
  26. 26. Shi-Fu-Mi (Rock-Paper-Scissors)● Fully observable in simultaneous play, but partially observable in turn-based version.● Computers stronger than humans (yes, its true).
  27. 27. Card games, phantom games● Phantomized version of a game: – You dont see the move of your opponents – If you play an illegal move, you are informed that its illegal, you play again – Usually, you get a few more information (captures, threats...) <== game-dependent● Phantom-games: – phantom-Chess = Kriegspiel ==> Dark Chess: more info – phantom-Go – etc.
  28. 28. Partially observable games● Usually quite heuristic algorithms● Best performing algorithms combine: – Opponent modelling (as for Shi-Fu-Mi) – Belief state (often by Monte-Carlo simulations) – Not a lot of tree search – A lot of tuning ==> usually no consistency analysis
  29. 29. Part I: Complexity analysis(unbounded horizon) – Game: ● One or two players ● Win, loss, draw (incl. endless loop) – Partial observability, no random part – Finite state space: ● state=transition(state,action) ● action decided by each player in turn
  30. 30. State of the art - makes sense in fully observable games - not so much in non-observable games
  31. 31. State of the art EXPTIME-complete in the general fully-observable case
  32. 32. EXPTIME-complete fullyobservable games - Chess (for some nxn generalization) - Go (with no superko) - Draughts (international or english) - Chinese checkers - Shogi
  33. 33. PSPACE-complete fullyobservable games - Amazons - Hex polynomial horizon - Go-moku + - Connect-6 full observation - Qubic ==> PSPACE - Reversi - Tic-Tac-Toe Many games with filling of each cell once and only once
  34. 34. EXPSPACE-completeunobservable games (Hasslun & Jonnsson) The two-player unobservable case is EXPSPACE-complete (games in succinct form, infinite horizon). (still for 100%win “UD” criterion - for not fully observable cases it is necessary to be precise...)Importantly, the UD criterion means that strategies are the same if the opponent has full observation as if he has no observation ==> UD is very bad :-(
  35. 35. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  36. 36. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (exponential list of actions is enough...) (b) Check the result against all possible strategies (III) We have to check the hardness only.
  37. 37. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form).PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  38. 38. E X P S P Atwo-player unobservable case is The C E - c o m p l e t eunobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ? Consider a TM with tape of size N=2^n. We must find a game - with size n ( n= log2(N) ) - such that the first player has a winning strategy for player 1 iff the TM halts.
  39. 39. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) .....................................
  40. 40. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4) Wins byfinal state !
  41. 41. EXPSPACE-completeuEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an x(3,1),x(3,2),x(3,3),x(3,4) illegal transition! ..................................... ==> P2 can check the x(N,1), x(N,2), x(N,3), x(N,4) consistency of one 3-uple per line Wins by ==> requests space log(N)final state ! ( = position of the 3-uple)
  42. 42. EXPSPACE-completeunobservable games The 1P+unknown initial state in the unobservable case is EXPSPACE-complete (games in succinct form). 2P+unobservable as well.
  43. 43. 2EXPTIME-complete PO games The two-player PO case, or 1P+random PO is 2EXP-complete (games in succinct form). (2P = 1P+random because of UD)
  44. 44. Undecidable games (B. Hearn) The three-player PO case is undecidable. (two players against one, not allowed to communicate)
  45. 45. Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
  46. 46. Another formalization c ==> much more satisfactory (might have drawbacks as well...)
  47. 47. Madani et al. c 1 player + random = undecidable (even without opponent!)
  48. 48. Madani et al. 1 player + random = undecidable. ==> answers a (related) question by Papadimitriou and Tsitsiklis. Proof ? Based on the emptiness problem for probabilistic finite automata (see Paz 71): Given a probabilistic finite automaton, is there a word accepted with proba at least c ? ==> undecidable
  49. 49. Consequence for unobservablegames c 1 player + random = undecidable ==> 2 players = undecidable.
  50. 50. Proof of “undecidability with 1 playeragainst random” ==> “undecidability with2 players” How to simulate 1 player + random with 2 players ?
  51. 51. A random node to be rewritten
  52. 52. A random node to be rewritten
  53. 53. A random node to be rewritten Rewritten as follows: ● Player 1 chooses a in [[0,N-1]] ● Player 2 chooses b in [[0,N-1]] ● c=(a+b) modulo N ● Go to tcEach player can force the game to be equivalent tothe initial one (by playing uniformly)==> the proba of winning for player 1 (in case of perfect play) is the same as for for the initial game==> undecidability!
  54. 54. Important remark Existence of a strategy for winning with proba 0.5 = also undecidable for the restriction to games in which the proba is >0.6 or <0.4 ==> not just a subtle precision trouble.
  55. 55. So what ? We have seen that unbounded horizon + partial observability + natural criterion (not sure win) ==> undecidability contrarily to what is expected from usual definitions. What about bounded horizon, 2P ? – Clearly decidable – Complexity ? – Algorithms ? (==> coevolution & LP)
  56. 56. Complexity (2P, 0-sum, no random) Unbounded Exponential Polynomial horizon horizon horizonFullObservability EXP EXP PSPACENo obs EXPSPACE NEXP(X=100%) (Hasslum et al, 2000)Partially 2EXP EXPSPACEObservable (Rintanen) (Mundhenk)(X=100%)Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXPNo obs undecidable <=2EXP (PL) <=EXP (PL) (concise matrix games)Partially undecidable <= 2EXP (PL) <= EXP (PL)Observable
  57. 57. Part II: Fictitious play (boundedhorizon) in the antagonist case Fictitious play ? Somehow an abstract version of antagonist coevolution with full memory● illimited population (finite, but increasing): one more indiv. per iteration● perfect choice of each mutation against the current population of opponents
  58. 58. Part II: Fictitious play in thezero-sum case Why zero-sum cases ? Evolutionary stable solutions (found by FP) are usually sub-optimal (as well as nature, for choosing lions strategies or cheating behaviors in Scaly- breasted Munia)
  59. 59. What is a matrix 0-sum game ?● A matrix M is given (type n x m).● Player 1 chooses (privately) i in [[1,n]]● Player 2 chooses j in [[1,n]]● Reward = Mij for player 1 = -Mij for player 2 (zero-sum game) ==> Model for finite antagonist games
  60. 60. Nash equilibrium● Nash equilibrium: there is a distribution of probability for each player (= mixed strategy) such that the reward is optimum (for the worst case on the distribution of probabilities by the opponent)● Linear programming is a polynomial algorithm for finding the Nash eq.● FP= tool for approximating it (at least in 0-sum cases)
  61. 61. Fictitious play (Brown 1949)● Each player starts with a distribution on its strategies● Each player in turn: – Finds an optimal strategy against the current opponents distribution (randomly break ties) – Adds it to its distribution (the distrib. does not sum to 1!)
  62. 62. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  63. 63. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  64. 64. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1● HT1=(1,0) HT2=(0,1)● HT1=(1,1) HT2=(0,2)● HT1=(1,2) HT2=(1,2)● HT1=(1,3) HT2=(2,2)● HT1=(1,4) HT2=(3,2)● HT1=(2,4) HT2=(4,2)● HT1=(3,4) HT2=(5,2)● HT1=(4,4) HT2=(6,2)● HT1=(5,4) HT2=(6,3) .......
  65. 65. Rock-paper-scissor● Rock:1, Papers=0, Scissors:0● RPS1=(1,0,0) RPS2=(1,0,0)● RPS1=(1,1,0) RPS2=(1,1,0)● RPS1=(1,2,0) RPS2=(1,1,1)● RPS1=(1,3,0) RPS2=(1,1,2)● RPS1=(2,3,0) RPS2=(1,2,2)● … ===> converges to Nash (Robinson 51)
  66. 66. Fictitious play TODO
  67. 67. Improvements for KxK matrixgame: approximations● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  68. 68. Improvements for KxK matrixgame: exact solution if k-sparse● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  69. 69. Improvements for KxK matrixgame: approximations● There exists  approximations in size O(log(K)/2) [Althoefer]● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP● Exact solution in time (Auger, Ruette, Teytaud) O (K log K · k 2k + poly(k) ) if solution k-sparse (good only if k smaller than log(K)/log(log(K)) ! better ?)
  70. 70. Improvements for KxK matrixgame: approximations So, LP & FP are two tools for matrix games. LP programming can be adapted to PO games without building the complete matrix (using information sets). The same for FP variants ?
  71. 71. Conclusions There are still natural questions which provide nice decidability problems Madani et al (1 player against random, no observability), extended here to 2 players with no random ==> undecidable problems “less than” the Halting problem ? Solving zero-sum matrix-games is still an active area of research ● Approximate cases ● Sparse case
  72. 72. Open problems● Phantom-Go undecidable ? (or other “real” game...)● Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE)● More to say about “epistemic” games (internal state not modified)● Frontier of undecidability in PO games ? (100% halting game: 2P become decidable)● Chess with finitely many pieces on infinite board: decidability of forced-mate ? (n-move: Brumleve et al, 2012, simulation in Presburger (thanks S. Riis :-) )
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×