Simple Lemmas on Partially Observable Games, and Applications to Phantom tic-tac-toe, Kriegspiel and Phantom-Go

452 views

Published on

@inproceedings{teytaud:inria-00625794,
hal_id = {inria-00625794},
url = {http://hal.inria.fr/inria-00625794},
title = {{Lemmas on Partial Observation, with Application to Phantom Games}},
author = {Teytaud, Fabien and Teytaud, Olivier},
abstract = {{Solving games is usual in the fully observable case. The partially observable case is much more difficult; whenever the number of strategies is finite (which is not necessarily the case, even when the state space is finite), the main tool for the exact solving is the construction of the full matrix game and its solving by linear programming. We here propose tools for approximating the value of partially observable games. The lemmas are relatively general, and we apply them for deriving rigorous bounds on the Nash equilibrium of phantom-tic-tac-toe and phantom-Go.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Computational Intelligence and Games}},
address = {Seoul, Cor{\'e}e, R{\'e}publique Populaire D{\'e}mocratique De},
audience = {internationale },
year = {2011},
month = Sep,
pdf = {http://hal.inria.fr/inria-00625794/PDF/phantomatari.pdf},
}


Published in: Entertainment & Humor
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
452
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Simple Lemmas on Partially Observable Games, and Applications to Phantom tic-tac-toe, Kriegspiel and Phantom-Go

  1. 1. Phantom GamesPhantom Games 1. Phantom-games & phantom-go 2. Maths 3. ExperimentsF. Teytaud, O. TeytaudTAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,OASE Lab,Korea,Summer 2011 1
  2. 2. Phantom GamesWhat are phantom games ?phantom-X = partial information counterpart of (full info) game X,youre not informed of your opponents movesso you might play illegal moves: then youre informed theyre illegal; and you just replay them.Extremal case: no other information (just illegal moves)More convenient: a bit more information: informed of ataris (in Go) see all the locations you can reach (Dark Chess). 2
  3. 3. Phantom GamesWhat are phantom games ?phantom-X = partial information counterpart of (full info) game X,youre not informed of your opponents movesso you might play illegal moves: then youre informed theyre illegal; and you just replay them. Example: phantom Tic-Tac-Toe 3
  4. 4. Phantom GamesWhat are phantom games ?phantom-X = partial information counterpart of (full info) game X,youre not informed of your opponents movesso you might play illegal moves: then youre informed theyre illegal; and you just replay them. My opponent plays (I dont know where) 4
  5. 5. Phantom GamesWhat are phantom games ?phantom-X = partial information counterpart of (full info) game X,youre not informed of your opponents movesso you might play illegal moves: then youre informed theyre illegal; and you just replay them. I try this... ==> illegal move! 5
  6. 6. Phantom GamesWhat are phantom games ?phantom-X = partial information counterpart of (full info) game X,youre not informed of your opponents movesso you might play illegal moves: then youre informed theyre illegal; and you just replay them. I know the state... ==> good :-) 6
  7. 7. Example: Dark Chess- Different from Chinese Dark Chess- Also known as “Fog of War”7
  8. 8. Example: phantom-Go 8
  9. 9. Phantom GamesA little bit of maths (sorry) 1. Phantom-games & phantom-go 2. Maths 3. ExperimentsF. Teytaud, O. TeytaudTAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,OASE Lab,Korea,Summer 2011 9
  10. 10. Simple thingsConsider a 2-player game with: - finite state space; - one of the two players wins.Then:- Full information: one of the player has a winning strategy. We can know who by Minimax. Possibly 2EXP-complete (Go with Japanese rules, Robsons paper).- Partial information, finite horizon: there exists p, Such that player 1 wins with proba p in case of perfect play. p is computable.- Partial information, infinite horizon: p not computable ! (Auger et al, 2010, submitted) 10
  11. 11. Other simple thingsPrevious stuff was known, and mathematically hard.Now, simple stuff, with concrete applications.Goals: making approximate solving of partially observable games more tractable. With precise bounds. 11
  12. 12. Other simple thing==> practiceDifference with full information games + applications:- good strategies are randomized (when playing games with hidden information) (illustration: play rock-paper-scissor; if you play a fixed strategy, at least one opponent is much stronger than you)- remark: there is an optimal strategy which is invariant w.r.t rotations/symmetries==> so we can work with only one version, and then symmetrize (uniformly)==> no loss of optimality (Nash sense) 12
  13. 13. Yet another simple thing ==> practice- Change the game as follows: player 2 chooses the hidden state when in state S.- Then, the game is harder for player 1 (in term of game-theoretical value).==> So we can lower bound the value by considering - the worst case on opponents strategies and - assuming he is allowed to rebuild the hidden state (consistently with your observations, however). ==> you get a matrix game (see example later)==> if you have both lower and upper bounds, you can estimate the value of an history of observations.==> looks stupid, but simplifies13 analysis (examples next slide)
  14. 14. Examples: 4x4 Ponnuki Simple case(phantom version) (you do it naturally) <=== Sure win in 4x4 ponnuki (phantom or not) ==> ==> at least 1/3 for black 14
  15. 15. Examples: 4x4 Ponnuki Better case (you dont do it without thinking at the method) <=== Sure win in 4x4 ponnuki (phantom or not) ==> ==> at least 1/3 for black 15
  16. 16. One more simple thing ==> practice- Specifically for phantom-games: if a move is either a win, or an “illegal” move, then play it.- Trivially ok (no optimality loss), reduces (very much) the set of strategies==> it cant hurt==> very compact representation 16
  17. 17. One last simple thing ==> practiceSpecifically for phantom-games: - If in fully observable game X, there are N possible sequences of actions and player 1 wins surely. - Then, player 1 wins with probability at least 1/N in phantom-X.Proof: Player 1 can reach proba 1/N of winning by playing randomly a sequence of actions = optimal sequence with proba 1/N (at least). 17
  18. 18. Good in Go, bad in phantom-Go:tightnessBlack to play.Go: black has lostPhantom-Go: Black wins with Proba 1-1/8!(==> bound fromprevious slideIs nearly tight) 18
  19. 19. Phantom GamesSome results on real games. 1. Phantom-games & phantom-go 2. Maths 3. Experiments (manually performed :-) )F. Teytaud, O. TeytaudTAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,OASE Lab,Korea,Summer 2011 19
  20. 20. Phantom-tic-tac-toeStrategy for 1st player / phantom-tic-tac-toe==> dominating moves = moves which are either illegal or wins 20
  21. 21. Phantom-tic-tac-toe: boundsThen, define 6 families of strategies for white, covering all possible cases;using the “simple facts”, we show that in all cases 1st player wins with proba at least 3/4But, 2nd player can ensure a draw in TTT.So by Lemma: value of Phantom-TTT in 21
  22. 22. Phantom-tic-tac-toe: boundsThen, define 6 families of strategies for white, covering all possible cases;using the “simple facts”, we show that in all cases 1st player wins with proba at least 3/4 384 = nb of legalBut, 2nd player can ensure a draw in TTT. sequences as 2nd playerSo by Lemma: value of Phantom-TTT in 22
  23. 23. Phantom-ponnuki3x3 is a win for black.4x4 is a win for black with proba:(by conversion/inequalities with matrix games) 23
  24. 24. ConclusionsHere some simple tools, with rigorous bounds on Phantom-tic-tac-toe Phantom-Ponnuki in 3x3 and 4x4The main tool is generic (opponent chooses hidden state ==> matrix game)Main further work: Implementation inside a search algorithm (e.g. for ranking moves or evaluating leafs) Other simplification ideas ? e.g. more on worst 24 case analysis
  25. 25. ConclusionsPO board games = great challenge Phantom-Go (humans still stronger than computers ?) Fog of War (dont know) MineSweeper: usual solvers are not optimal (they optimize the short-term only: minimum proba of mine) We got optimal play in 6x6, 4 mines. ==> better models than board games for real AI ? ==> involves taste of danger; beyond IQ ? ==> my feeling: many CI improvements possible here, maths can help. (human-level performance25at Urban Rivals, a PO card game)
  26. 26. Finished!...thanks for your attention ! ... 26

×