SlideShare a Scribd company logo
1 of 72
Sequential decision making:
 decidability and complexity



Games with partial
observation
 Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project

TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of Excellence.


Paris
September 2012.
A quite general model

        A directed graph (finite).
 A Starting point on the graph, a target (or
  several targets, with different rewards).
          I want to reach a target.

     Labels(=decisions) on edges:
  Next node = f( current node, decision)

            Each node is either:
      - random node (random decision).
    - decision node (I choose a decision)
  - opponent node (an opponent chooses)
Partial observation



           Each decision node
   is equipped with an observation;
     you can make decisions using
      the list of past observations

       ==> you don't know
     where you are in the graph
Overview

●   10%: overview of Alternating Turing
    machine & computational complexity
                          (great tool for complexity upper bounds)

●   50%: general culture on games
                          (including undecidability)
●   35%: general culture on fictitious play
         (matrix games)       (probably no time for this...)
●   4%: my results on that stuff
    ==> 2 detailed proofs (one new)
    ==> feel free of interrupting
Outline


●   Complexity and ATM


●   Complexity and games (incl. planning)


●   Bounded horizon games
Classical complexity classes
 P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE


 Proved:
 PSPACE ≠ EXPSPACE       P ≠ EXPTIME
 NP ≠ NEXPTIME


 Believed, not proved:
 P≠NP                    EXPTIME≠NEXPTIME
 NEXPTIME≠EXPSPACE
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “for all” states (i.e. several
       transitions, accepts if all transitions
       accept)
●   Co-NTM: TM with “exists” states (i.e.
    several transitions, accepts if at least one
    transition accepts)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “exists” states (i.e.
    several transitions, accepts if at least one
    transition accepts)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “for all” states (i.e.
    several transitions, accepts if all lead to
    accept)
●   ATM: TM with both “exists” and “for all”
    states.
Complexity and alternating
 Turing machines
●   Turing machine (TM)= abstract computer
●   Non-deterministic Turing Machine (NTM)
       = TM with “exists” states (i.e. several
       transitions, accepts if at least one
       accepts)
●   Co-NTM: TM with “for all” states (i.e.
    several transitions, accepts if all lead to
    accept)
●   ATM: TM with both “exists” and “for all”
    states.
Alternation
Non-determinism & alternation
Outline


●   Complexity and ATM


●   Complexity and games (incl.
    planning)


●   Bounded horizon games
Computational complexity:
 framework



 Uncertainty can be:
     –   Adversarial: I focus on worst case
     –   Stochastic: I focus on average result
     –   Or both.


 “Stochastic = adversarial” if goal = 100%
 success.
 “Stochastic != adversarial” in the general case.
Computational complexity:
 framework

 Many representations for problems. E.g.:
    –   Succinct: a circuit computes the ith bit of
         the proba that action a leads to a
         transition from s to s'
    –   Compressed: a circuit computes many bits
         simultaneously
    –   Flat: longer encoding (transition tables)

 ==> does not matter for decidability
 ==> matters for complexity
Computational complexity:
 framework

 Many representations for problems. E.g.:
    –   Succinct
    –   Compressed
    –   Flat


 Compressed representation “somehow” natural
 (state space has exponential size, transitions
 are fast): see e.g. Mundhenk for detailed defs
 and flat representations.
Computational complexity:
 framework
 We use mainly compressed representation; see
 also Mundhenk for flat representations.


 Typically, exponentially small representations
 lead to exponentially higher complexity
 ==> but it's not always the case...


 Simple things can change a lot the complexity:
 “superko”: rules forbid twice the same position;
 some fully observable 2Player games become
 EXPSPACE instead of EXP ==> discussed later
Computational complexity: framework
 for first tables of results

  Either search (find a target)
        or optimize (cumulate rewards over time)

  Compressed (written with circuits or others...)
  or not (flat).

  Horizon:
  - Short horizon: horizon ≤ size of input
  - Long horizon: log2(horizon) ≤ size of input
  - Infinite horizon: no limit
Mundhenk's summary: one player,
 limited horizon: expected reward >0 ?
Mundhenk's summary: one player, non-negative
 reward, looking for non-neg. average reward
 (= positive proba of reaching): easier
Complexity, partial observation, infinite
 horizon, proba of reaching a target


●   1P+random, unobservable: undecidable
    (Madani et al)
●   1P+random, P(win=1),
        or equivalently 2P, P(win=1):
                      [Rintanen and refs therein]
         –   Fully observable: EXP   [Littman94]

         –   Unobservable: EXPSPACE       [Hasslum et al 2000]
         –   Partial observability: 2EXP [rintanen, 2003]


             Rmk: “2P, P(win=1)” is not “2P”!
Complexity, partial observation,
 infinite horizon

●   2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine]
●   2P (random or not):
       –   Existence of sure win: equiv. to 1P+random !
              ●   EXP full-observable (e.g. Go, Robson 1984)
              ●   PSPACE unobservable
              ●   2EXP partially observable
       –   Existence of sure win, same state forbidden:
            EXPSPACE-complete (Go with Chinese rules ?
            rather conjectured EXPTIME or PSPACE...)
       –   General case (optimal play): undecidable
            (Auger, Teytaud) (what about phantom-Go ?)
Complexity, partial observation

    Remarks:
●   Continuous case ?
●   Purely epistemic (we gather information, we
    don't change the state) ? [Sabbadin et al]
●   Restrictions on the policy, on the set of
    actions...
●   Discounted reward
●   DEC-POMDP, POSG : many players,
    same/opposite/different reward functions...
What are the approaches ?

 –   Dynamic programming              (Massé – Bellman 50's) (still
     the main approach in industry), alpha-beta, retrograde analysis
 –   Reinforcement learning
 –   MCTS (R. Coulom. Efficient Selectivity and Backup
     Operators in Monte-Carlo Tree Search. In
     Proceedings of the 5th International Conference on
     Computers and Games, Turin, Italy, 2006)
 –   Scripts + Tuning / Direct Policy Search
 –   Coevolution


     All have their PO extensions but the two last
     are the most convenient in this case.
Partially observable games

    Many tools for fully observable games.
    Not so many for partially observable ones.


●   Shi-Fu-Mi (Rock Paper Scissor)


●   Card games


●   Phantom games
Shi-Fu-Mi (Rock-Paper-Scissors)
●   Fully observable in simultaneous play, but
    partially observable in turn-based version.




●   Computers stronger than humans (yes, it's
    true).
Card games, phantom games
●   Phantomized version of a game:
       –   You don't see the move of your opponents
       –   If you play an illegal move, you are
              informed that it's illegal, you play again
       –   Usually, you get a few more information
            (captures, threats...) <== game-dependent
●   Phantom-games:
       –   phantom-Chess = Kriegspiel
           ==> Dark Chess: more info
       –   phantom-Go
       –   etc.
Partially observable games
●   Usually quite heuristic algorithms
●   Best performing algorithms combine:
       –   Opponent modelling (as for Shi-Fu-Mi)
       –   Belief state (often by Monte-Carlo
            simulations)
       –   Not a lot of tree search
       –   A lot of tuning
           ==> usually no consistency analysis
Part I: Complexity analysis
(unbounded horizon)


 –   Game:
             ●   One or two players
             ●   Win, loss, draw (incl. endless loop)


 –   Partial observability, no random part


 –   Finite state space:
             ●   state=transition(state,action)
             ●   action decided by each player in turn
State of the art




 - makes sense in fully observable games
 - not so much in non-observable games
State of the art




 EXPTIME-complete in the general
   fully-observable case
EXPTIME-complete fully
observable games


  - Chess (for some nxn generalization)

  - Go (with no superko)

  - Draughts (international or english)

  - Chinese checkers

  - Shogi
PSPACE-complete fully
observable games

    - Amazons
    - Hex                        polynomial horizon
    - Go-moku                              +
    - Connect-6                    full observation
    - Qubic                         ==> PSPACE
    - Reversi
    - Tic-Tac-Toe


      Many games with filling of each cell once and only once
EXPSPACE-complete
unobservable games                (Hasslun & Jonnsson)



      The two-player unobservable case is
      EXPSPACE-complete
      (games in succinct form, infinite horizon).

              (still for 100%win “UD” criterion -
                   for not fully observable cases it
                       is necessary to be precise...)

Importantly, the UD criterion means that strategies
  are the same if the opponent has full observation
 as if he has no observation ==> UD is very bad :-(
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       Actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions (exponential list of actions is enough...)
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).


PROOF:
 (I) First note that strategies are just sequences of actions
    (no observability!)
 (II) It is in EXPSPACE=NEXPSPACE, because of the
   following algorithm:
   (a) Non-deterministically choose the sequence of
       actions
   (b) Check the result against all possible strategies
 (III) We have to check the hardness only.
E X P S P Atwo-player unobservable case is
      The C E - c o m p l e t e
unobservable games (Hasslun & Jonnsson)
      EXPSPACE-complete
      (games in succinct form).
  PROOF of the hardness:
   Reduction to: is my TM with exponential tape
    going to halt ?

  Consider a TM with tape of size N=2^n.

  We must find a game
  - with size n              ( n= log2(N) )
  - such that the first player has a winning
         strategy for player 1 iff the TM halts.
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


        Player 1 chooses the sequence of
        configurations of the tape (N=4):

         x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
         x(1,1),x(1,2),x(1,3),x(1,4)
         x(2,1),x(2,2),x(2,3),x(2,4)
         x(3,1),x(3,2),x(3,3),x(3,4)
          .....................................
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)
                 x(3,1),x(3,2),x(3,3),x(3,4)
                  .....................................
                 x(N,1), x(N,2), x(N,3), x(N,4)

  Wins by
final state !
EXPSPACE-complete
uEncoding ravTuring machine s ( Ha stape & J osizes oN)
  n o b s e a b l e g a m e with a s l u n of n n s n
           as a game with state O(log(N))


                Player 1 chooses the sequence of
                configurations of the tape (N=4):

                 x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state
                 x(1,1),x(1,2),x(1,3),x(1,4)
                 x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an
                 x(3,1),x(3,2),x(3,3),x(3,4) illegal transition!
                  ..................................... ==> P2 can check the
                 x(N,1), x(N,2), x(N,3), x(N,4)
                                         consistency of one 3-uple per line

  Wins by                                       ==> requests space log(N)
final state !                                   ( = position of the 3-uple)
EXPSPACE-complete
unobservable games



     The 1P+unknown initial state in the
     unobservable case is
     EXPSPACE-complete
     (games in succinct form).

     2P+unobservable as well.
2EXPTIME-complete PO games




     The two-player PO case,
       or 1P+random PO is
     2EXP-complete
     (games in succinct form).

   (2P = 1P+random because of UD)
Undecidable games               (B. Hearn)




     The three-player PO case is
     undecidable. (two players against one,
     not allowed to communicate)
Hummm ?




 Do you know a PO game in which you can
 ensure a win with probability 1 ?
Another formalization




                       c




 ==> much more satisfactory
    (might have drawbacks as well...)
Madani et al.




                        c




  1 player + random = undecidable
                    (even without opponent!)
Madani et al.

 1 player + random = undecidable.
 ==> answers a (related) question by
          Papadimitriou and Tsitsiklis.

 Proof ?

 Based on the emptiness problem for
 probabilistic finite automata (see Paz 71):

 Given a probabilistic finite automaton,
 is there a word accepted with proba at least c ?
 ==> undecidable
Consequence for unobservable
games




                       c




 1 player + random = undecidable
 ==> 2 players = undecidable.
Proof of “undecidability with 1 player
against random” ==> “undecidability with
2 players”


   How to simulate 1 player + random with 2
   players ?
A random node to be rewritten
A random node to be rewritten
A random node to be rewritten

     Rewritten as follows:
 ●   Player 1 chooses a in [[0,N-1]]
 ●   Player 2 chooses b in [[0,N-1]]
 ●   c=(a+b) modulo N
 ●   Go to tc

Each player can force the game to be equivalent to
the initial one (by playing uniformly)
==> the proba of winning for player 1 (in case of perfect play)
   is the same as for for the initial game
==> undecidability!
Important remark



  Existence of a strategy for winning with
  proba 0.5 = also undecidable for the
  restriction to games in which the proba
  is >0.6 or <0.4 ==> not just a subtle
  precision trouble.
So what ?

 We have seen that
           unbounded horizon
        + partial observability
        + natural criterion (not sure win)
        ==> undecidability
        contrarily to what is expected from usual definitions.



 What about bounded horizon, 2P ?
    –   Clearly decidable
    –   Complexity ?
    –   Algorithms ? (==> coevolution & LP)
Complexity (2P, 0-sum, no
  random)
                   Unbounded               Exponential    Polynomial
                     horizon                horizon        horizon
Full
Observability         EXP                   EXP          PSPACE

No obs             EXPSPACE                 NEXP
(X=100%)           (Hasslum et al, 2000)


Partially             2EXP                  EXPSPACE
Observable             (Rintanen)           (Mundhenk)
(X=100%)

Simult. Actions   ? EXPSPACE ?             <<<= EXP        <<<= EXP

No obs             undecidable             <=2EXP (PL)           <=EXP (PL)
                                               (concise matrix games)
Partially          undecidable             <= 2EXP (PL)          <= EXP (PL)
Observable
Part II: Fictitious play (bounded
horizon) in the antagonist case

    Fictitious play ?
    Somehow an abstract version of
    antagonist coevolution with full memory

●   illimited population (finite, but
    increasing): one more indiv. per iteration
●   perfect choice of each mutation against
    the current population of opponents
Part II: Fictitious play in the
zero-sum case

  Why zero-sum cases ?


  Evolutionary stable solutions (found by
  FP) are usually sub-optimal (as well as nature,
  for choosing lion's strategies or cheating behaviors in Scaly-
  breasted Munia)
What is a matrix 0-sum game ?


●   A matrix M is given (type n x m).
●   Player 1 chooses (privately) i in [[1,n]]
●   Player 2 chooses              j in [[1,n]]
●   Reward
      = Mij for player 1
      = -Mij for player 2 (zero-sum game)
    ==> Model for finite antagonist games
Nash equilibrium

●   Nash equilibrium: there is a distribution
    of probability for each player
             (= mixed strategy)
    such that the reward is optimum (for the
    worst case on the distribution of
    probabilities by the opponent)
●   Linear programming is a polynomial
    algorithm for finding the Nash eq.
●   FP= tool for approximating it
                   (at least in 0-sum cases)
Fictitious play                 (Brown 1949)


●   Each player starts with a distribution on
    its strategies
●   Each player in turn:
       –   Finds an optimal strategy against the
             current opponent's distribution (randomly
            break ties)

       –   Adds it to its distribution (the distrib. does
            not sum to 1!)
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Matching penny

    1 -1        (i.e. player 1 wins iff i=j)
    -1 1
●   HT1=(1,0)    HT2=(0,1)
●   HT1=(1,1)    HT2=(0,2)
●   HT1=(1,2)    HT2=(1,2)
●   HT1=(1,3)    HT2=(2,2)
●   HT1=(1,4)    HT2=(3,2)
●   HT1=(2,4)    HT2=(4,2)
●   HT1=(3,4)    HT2=(5,2)
●   HT1=(4,4)    HT2=(6,2)
●   HT1=(5,4)    HT2=(6,3) .......
Rock-paper-scissor


●   Rock:1, Papers=0, Scissors:0
●   RPS1=(1,0,0)          RPS2=(1,0,0)
●   RPS1=(1,1,0)          RPS2=(1,1,0)
●   RPS1=(1,2,0)          RPS2=(1,1,1)
●   RPS1=(1,3,0)          RPS2=(1,1,2)
●   RPS1=(2,3,0)          RPS2=(1,2,2)
●   …
    ===> converges to Nash (Robinson 51)
Fictitious play

  TODO
Improvements for KxK matrix
game: approximations

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
Improvements for KxK matrix
game: exact solution if k-sparse

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
Improvements for KxK matrix
game: approximations

●   There exists  approximations in size
    O(log(K)/2) [Althoefer]
●   Such an approximation can be found in
    time O(Klog K / 2) [Grigoriadis et al]: basically a
    stochastic FP
●   Exact solution in time         (Auger, Ruette, Teytaud)

      O (K log K · k 2k + poly(k) )
       if solution k-sparse (good only if k
       smaller than log(K)/log(log(K)) !
       better ?)
Improvements for KxK matrix
game: approximations

 So, LP & FP are two tools for matrix
 games.


 LP programming can be adapted to PO
 games without building the complete
 matrix (using information sets).


 The same for FP variants ?
Conclusions

  There are still natural questions which
  provide nice decidability problems
  Madani et al (1 player against random, no observability), extended here to
  2 players with no random



  ==> undecidable problems “less than”
     the Halting problem ?

  Solving zero-sum matrix-games is still an
  active area of research
                ●   Approximate cases
                ●   Sparse case
Open problems

●   Phantom-Go undecidable ?             (or other “real” game...)
●   Complexity of Go with Chinese rules ?
      (conjectured: PSPACE or EXPTIME;
       proved PSPACE-hard + EXPSPACE)
●   More to say about “epistemic” games (internal
    state not modified)
●   Frontier of undecidability in PO games ?
    (100% halting game: 2P become decidable)
●   Chess with finitely many pieces on infinite board:
    decidability of forced-mate ?
    (n-move: Brumleve et al, 2012, simulation in Presburger
                                               (thanks S. Riis :-) )

More Related Content

Viewers also liked

Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind GoOlivier Teytaud
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)Olivier Teytaud
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...Olivier Teytaud
 
Provocative statements around energy
Provocative statements around energyProvocative statements around energy
Provocative statements around energyOlivier Teytaud
 

Viewers also liked (12)

Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with Parallelism
 
Labex2012g
Labex2012gLabex2012g
Labex2012g
 
Grenoble
GrenobleGrenoble
Grenoble
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary Algorithms
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
Tutorialmcts
TutorialmctsTutorialmcts
Tutorialmcts
 
Provocative statements around energy
Provocative statements around energyProvocative statements around energy
Provocative statements around energy
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
 

Similar to Games with partial information

Class 16: Making Loops
Class 16: Making LoopsClass 16: Making Loops
Class 16: Making LoopsDavid Evans
 
Imperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsImperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsPaolo Penna
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiouvafopoulos
 
Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Amine Bendahmane
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Recurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineRecurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineGAYO3
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies vpnmentor
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrenciesvpnmentor
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systemsOlivier Teytaud
 
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence gamesSujithmlamthadam
 
Lecture 1
Lecture 1Lecture 1
Lecture 1butest
 

Similar to Games with partial information (20)

Class 16: Making Loops
Class 16: Making LoopsClass 16: Making Loops
Class 16: Making Loops
 
Imperfect Best-Response Mechanisms
Imperfect Best-Response MechanismsImperfect Best-Response Mechanisms
Imperfect Best-Response Mechanisms
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for Minesweeper
 
2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou2010 3-24 cryptography stamatiou
2010 3-24 cryptography stamatiou
 
Pathfinding in games
Pathfinding in gamesPathfinding in games
Pathfinding in games
 
Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)Deep learning and neural networks (using simple mathematics)
Deep learning and neural networks (using simple mathematics)
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Recurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machineRecurrent Neuron Network-from point of dynamic system & state machine
Recurrent Neuron Network-from point of dynamic system & state machine
 
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
2018 MUMS Fall Course - Gaussian Processes and Statistic Emulators (EDITED) -...
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
 
Alternative cryptocurrencies
Alternative cryptocurrenciesAlternative cryptocurrencies
Alternative cryptocurrencies
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
AI.ppt
AI.pptAI.ppt
AI.ppt
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Artificial intelligence games
Artificial intelligence gamesArtificial intelligence games
Artificial intelligence games
 
Ai
AiAi
Ai
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Daa notes 2
Daa notes 2Daa notes 2
Daa notes 2
 

Recently uploaded

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Games with partial information

  • 1. Sequential decision making: decidability and complexity Games with partial observation Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ. Paris-Sud, LRI, Taiwan universities (including NUTN), CITINES project TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. Paris September 2012.
  • 2. A quite general model A directed graph (finite). A Starting point on the graph, a target (or several targets, with different rewards). I want to reach a target. Labels(=decisions) on edges: Next node = f( current node, decision) Each node is either: - random node (random decision). - decision node (I choose a decision) - opponent node (an opponent chooses)
  • 3. Partial observation Each decision node is equipped with an observation; you can make decisions using the list of past observations ==> you don't know where you are in the graph
  • 4. Overview ● 10%: overview of Alternating Turing machine & computational complexity (great tool for complexity upper bounds) ● 50%: general culture on games (including undecidability) ● 35%: general culture on fictitious play (matrix games) (probably no time for this...) ● 4%: my results on that stuff ==> 2 detailed proofs (one new) ==> feel free of interrupting
  • 5. Outline ● Complexity and ATM ● Complexity and games (incl. planning) ● Bounded horizon games
  • 6. Classical complexity classes P ⊂ NP ⊂ PSPACE ⊂ EXPTIME ⊂ NEXPTIME ⊂ EXPSPACE Proved: PSPACE ≠ EXPSPACE P ≠ EXPTIME NP ≠ NEXPTIME Believed, not proved: P≠NP EXPTIME≠NEXPTIME NEXPTIME≠EXPSPACE
  • 7. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “for all” states (i.e. several transitions, accepts if all transitions accept) ● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts) ● ATM: TM with both “exists” and “for all” states.
  • 8. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “exists” states (i.e. several transitions, accepts if at least one transition accepts) ● ATM: TM with both “exists” and “for all” states.
  • 9. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept) ● ATM: TM with both “exists” and “for all” states.
  • 10. Complexity and alternating Turing machines ● Turing machine (TM)= abstract computer ● Non-deterministic Turing Machine (NTM) = TM with “exists” states (i.e. several transitions, accepts if at least one accepts) ● Co-NTM: TM with “for all” states (i.e. several transitions, accepts if all lead to accept) ● ATM: TM with both “exists” and “for all” states.
  • 13. Outline ● Complexity and ATM ● Complexity and games (incl. planning) ● Bounded horizon games
  • 14. Computational complexity: framework Uncertainty can be: – Adversarial: I focus on worst case – Stochastic: I focus on average result – Or both. “Stochastic = adversarial” if goal = 100% success. “Stochastic != adversarial” in the general case.
  • 15. Computational complexity: framework Many representations for problems. E.g.: – Succinct: a circuit computes the ith bit of the proba that action a leads to a transition from s to s' – Compressed: a circuit computes many bits simultaneously – Flat: longer encoding (transition tables) ==> does not matter for decidability ==> matters for complexity
  • 16. Computational complexity: framework Many representations for problems. E.g.: – Succinct – Compressed – Flat Compressed representation “somehow” natural (state space has exponential size, transitions are fast): see e.g. Mundhenk for detailed defs and flat representations.
  • 17. Computational complexity: framework We use mainly compressed representation; see also Mundhenk for flat representations. Typically, exponentially small representations lead to exponentially higher complexity ==> but it's not always the case... Simple things can change a lot the complexity: “superko”: rules forbid twice the same position; some fully observable 2Player games become EXPSPACE instead of EXP ==> discussed later
  • 18. Computational complexity: framework for first tables of results Either search (find a target) or optimize (cumulate rewards over time) Compressed (written with circuits or others...) or not (flat). Horizon: - Short horizon: horizon ≤ size of input - Long horizon: log2(horizon) ≤ size of input - Infinite horizon: no limit
  • 19. Mundhenk's summary: one player, limited horizon: expected reward >0 ?
  • 20. Mundhenk's summary: one player, non-negative reward, looking for non-neg. average reward (= positive proba of reaching): easier
  • 21. Complexity, partial observation, infinite horizon, proba of reaching a target ● 1P+random, unobservable: undecidable (Madani et al) ● 1P+random, P(win=1), or equivalently 2P, P(win=1): [Rintanen and refs therein] – Fully observable: EXP [Littman94] – Unobservable: EXPSPACE [Hasslum et al 2000] – Partial observability: 2EXP [rintanen, 2003] Rmk: “2P, P(win=1)” is not “2P”!
  • 22. Complexity, partial observation, infinite horizon ● 2P vs 1P,P(win)=1?:undecidable![Hearn, Demaine] ● 2P (random or not): – Existence of sure win: equiv. to 1P+random ! ● EXP full-observable (e.g. Go, Robson 1984) ● PSPACE unobservable ● 2EXP partially observable – Existence of sure win, same state forbidden: EXPSPACE-complete (Go with Chinese rules ? rather conjectured EXPTIME or PSPACE...) – General case (optimal play): undecidable (Auger, Teytaud) (what about phantom-Go ?)
  • 23. Complexity, partial observation Remarks: ● Continuous case ? ● Purely epistemic (we gather information, we don't change the state) ? [Sabbadin et al] ● Restrictions on the policy, on the set of actions... ● Discounted reward ● DEC-POMDP, POSG : many players, same/opposite/different reward functions...
  • 24. What are the approaches ? – Dynamic programming (Massé – Bellman 50's) (still the main approach in industry), alpha-beta, retrograde analysis – Reinforcement learning – MCTS (R. Coulom. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, 2006) – Scripts + Tuning / Direct Policy Search – Coevolution All have their PO extensions but the two last are the most convenient in this case.
  • 25. Partially observable games Many tools for fully observable games. Not so many for partially observable ones. ● Shi-Fu-Mi (Rock Paper Scissor) ● Card games ● Phantom games
  • 26. Shi-Fu-Mi (Rock-Paper-Scissors) ● Fully observable in simultaneous play, but partially observable in turn-based version. ● Computers stronger than humans (yes, it's true).
  • 27. Card games, phantom games ● Phantomized version of a game: – You don't see the move of your opponents – If you play an illegal move, you are informed that it's illegal, you play again – Usually, you get a few more information (captures, threats...) <== game-dependent ● Phantom-games: – phantom-Chess = Kriegspiel ==> Dark Chess: more info – phantom-Go – etc.
  • 28. Partially observable games ● Usually quite heuristic algorithms ● Best performing algorithms combine: – Opponent modelling (as for Shi-Fu-Mi) – Belief state (often by Monte-Carlo simulations) – Not a lot of tree search – A lot of tuning ==> usually no consistency analysis
  • 29. Part I: Complexity analysis (unbounded horizon) – Game: ● One or two players ● Win, loss, draw (incl. endless loop) – Partial observability, no random part – Finite state space: ● state=transition(state,action) ● action decided by each player in turn
  • 30. State of the art - makes sense in fully observable games - not so much in non-observable games
  • 31. State of the art EXPTIME-complete in the general fully-observable case
  • 32. EXPTIME-complete fully observable games - Chess (for some nxn generalization) - Go (with no superko) - Draughts (international or english) - Chinese checkers - Shogi
  • 33. PSPACE-complete fully observable games - Amazons - Hex polynomial horizon - Go-moku + - Connect-6 full observation - Qubic ==> PSPACE - Reversi - Tic-Tac-Toe Many games with filling of each cell once and only once
  • 34. EXPSPACE-complete unobservable games (Hasslun & Jonnsson) The two-player unobservable case is EXPSPACE-complete (games in succinct form, infinite horizon). (still for 100%win “UD” criterion - for not fully observable cases it is necessary to be precise...) Importantly, the UD criterion means that strategies are the same if the opponent has full observation as if he has no observation ==> UD is very bad :-(
  • 35. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of Actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 36. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (exponential list of actions is enough...) (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 37. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF: (I) First note that strategies are just sequences of actions (no observability!) (II) It is in EXPSPACE=NEXPSPACE, because of the following algorithm: (a) Non-deterministically choose the sequence of actions (b) Check the result against all possible strategies (III) We have to check the hardness only.
  • 38. E X P S P Atwo-player unobservable case is The C E - c o m p l e t e unobservable games (Hasslun & Jonnsson) EXPSPACE-complete (games in succinct form). PROOF of the hardness: Reduction to: is my TM with exponential tape going to halt ? Consider a TM with tape of size N=2^n. We must find a game - with size n ( n= log2(N) ) - such that the first player has a winning strategy for player 1 iff the TM halts.
  • 39. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) .....................................
  • 40. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4) x(3,1),x(3,2),x(3,3),x(3,4) ..................................... x(N,1), x(N,2), x(N,3), x(N,4) Wins by final state !
  • 41. EXPSPACE-complete uEncoding ravTuring machine s ( Ha stape & J osizes oN) n o b s e a b l e g a m e with a s l u n of n n s n as a game with state O(log(N)) Player 1 chooses the sequence of configurations of the tape (N=4): x(0,1),x(0,2),x(0,3),x(0,4) ==> initial state x(1,1),x(1,2),x(1,3),x(1,4) x(2,1),x(2,2),x(2,3),x(2,4)Except if P2 finds an x(3,1),x(3,2),x(3,3),x(3,4) illegal transition! ..................................... ==> P2 can check the x(N,1), x(N,2), x(N,3), x(N,4) consistency of one 3-uple per line Wins by ==> requests space log(N) final state ! ( = position of the 3-uple)
  • 42. EXPSPACE-complete unobservable games The 1P+unknown initial state in the unobservable case is EXPSPACE-complete (games in succinct form). 2P+unobservable as well.
  • 43. 2EXPTIME-complete PO games The two-player PO case, or 1P+random PO is 2EXP-complete (games in succinct form). (2P = 1P+random because of UD)
  • 44. Undecidable games (B. Hearn) The three-player PO case is undecidable. (two players against one, not allowed to communicate)
  • 45. Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
  • 46. Another formalization c ==> much more satisfactory (might have drawbacks as well...)
  • 47. Madani et al. c 1 player + random = undecidable (even without opponent!)
  • 48. Madani et al. 1 player + random = undecidable. ==> answers a (related) question by Papadimitriou and Tsitsiklis. Proof ? Based on the emptiness problem for probabilistic finite automata (see Paz 71): Given a probabilistic finite automaton, is there a word accepted with proba at least c ? ==> undecidable
  • 49. Consequence for unobservable games c 1 player + random = undecidable ==> 2 players = undecidable.
  • 50. Proof of “undecidability with 1 player against random” ==> “undecidability with 2 players” How to simulate 1 player + random with 2 players ?
  • 51. A random node to be rewritten
  • 52. A random node to be rewritten
  • 53. A random node to be rewritten Rewritten as follows: ● Player 1 chooses a in [[0,N-1]] ● Player 2 chooses b in [[0,N-1]] ● c=(a+b) modulo N ● Go to tc Each player can force the game to be equivalent to the initial one (by playing uniformly) ==> the proba of winning for player 1 (in case of perfect play) is the same as for for the initial game ==> undecidability!
  • 54. Important remark Existence of a strategy for winning with proba 0.5 = also undecidable for the restriction to games in which the proba is >0.6 or <0.4 ==> not just a subtle precision trouble.
  • 55. So what ? We have seen that unbounded horizon + partial observability + natural criterion (not sure win) ==> undecidability contrarily to what is expected from usual definitions. What about bounded horizon, 2P ? – Clearly decidable – Complexity ? – Algorithms ? (==> coevolution & LP)
  • 56. Complexity (2P, 0-sum, no random) Unbounded Exponential Polynomial horizon horizon horizon Full Observability EXP EXP PSPACE No obs EXPSPACE NEXP (X=100%) (Hasslum et al, 2000) Partially 2EXP EXPSPACE Observable (Rintanen) (Mundhenk) (X=100%) Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXP No obs undecidable <=2EXP (PL) <=EXP (PL) (concise matrix games) Partially undecidable <= 2EXP (PL) <= EXP (PL) Observable
  • 57. Part II: Fictitious play (bounded horizon) in the antagonist case Fictitious play ? Somehow an abstract version of antagonist coevolution with full memory ● illimited population (finite, but increasing): one more indiv. per iteration ● perfect choice of each mutation against the current population of opponents
  • 58. Part II: Fictitious play in the zero-sum case Why zero-sum cases ? Evolutionary stable solutions (found by FP) are usually sub-optimal (as well as nature, for choosing lion's strategies or cheating behaviors in Scaly- breasted Munia)
  • 59. What is a matrix 0-sum game ? ● A matrix M is given (type n x m). ● Player 1 chooses (privately) i in [[1,n]] ● Player 2 chooses j in [[1,n]] ● Reward = Mij for player 1 = -Mij for player 2 (zero-sum game) ==> Model for finite antagonist games
  • 60. Nash equilibrium ● Nash equilibrium: there is a distribution of probability for each player (= mixed strategy) such that the reward is optimum (for the worst case on the distribution of probabilities by the opponent) ● Linear programming is a polynomial algorithm for finding the Nash eq. ● FP= tool for approximating it (at least in 0-sum cases)
  • 61. Fictitious play (Brown 1949) ● Each player starts with a distribution on its strategies ● Each player in turn: – Finds an optimal strategy against the current opponent's distribution (randomly break ties) – Adds it to its distribution (the distrib. does not sum to 1!)
  • 62. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 63. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 64. Matching penny 1 -1 (i.e. player 1 wins iff i=j) -1 1 ● HT1=(1,0) HT2=(0,1) ● HT1=(1,1) HT2=(0,2) ● HT1=(1,2) HT2=(1,2) ● HT1=(1,3) HT2=(2,2) ● HT1=(1,4) HT2=(3,2) ● HT1=(2,4) HT2=(4,2) ● HT1=(3,4) HT2=(5,2) ● HT1=(4,4) HT2=(6,2) ● HT1=(5,4) HT2=(6,3) .......
  • 65. Rock-paper-scissor ● Rock:1, Papers=0, Scissors:0 ● RPS1=(1,0,0) RPS2=(1,0,0) ● RPS1=(1,1,0) RPS2=(1,1,0) ● RPS1=(1,2,0) RPS2=(1,1,1) ● RPS1=(1,3,0) RPS2=(1,1,2) ● RPS1=(2,3,0) RPS2=(1,2,2) ● … ===> converges to Nash (Robinson 51)
  • 67. Improvements for KxK matrix game: approximations ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 68. Improvements for KxK matrix game: exact solution if k-sparse ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP
  • 69. Improvements for KxK matrix game: approximations ● There exists  approximations in size O(log(K)/2) [Althoefer] ● Such an approximation can be found in time O(Klog K / 2) [Grigoriadis et al]: basically a stochastic FP ● Exact solution in time (Auger, Ruette, Teytaud) O (K log K · k 2k + poly(k) ) if solution k-sparse (good only if k smaller than log(K)/log(log(K)) ! better ?)
  • 70. Improvements for KxK matrix game: approximations So, LP & FP are two tools for matrix games. LP programming can be adapted to PO games without building the complete matrix (using information sets). The same for FP variants ?
  • 71. Conclusions There are still natural questions which provide nice decidability problems Madani et al (1 player against random, no observability), extended here to 2 players with no random ==> undecidable problems “less than” the Halting problem ? Solving zero-sum matrix-games is still an active area of research ● Approximate cases ● Sparse case
  • 72. Open problems ● Phantom-Go undecidable ? (or other “real” game...) ● Complexity of Go with Chinese rules ? (conjectured: PSPACE or EXPTIME; proved PSPACE-hard + EXPSPACE) ● More to say about “epistemic” games (internal state not modified) ● Frontier of undecidability in PO games ? (100% halting game: 2P become decidable) ● Chess with finitely many pieces on infinite board: decidability of forced-mate ? (n-move: Brumleve et al, 2012, simulation in Presburger (thanks S. Riis :-) )