SlideShare a Scribd company logo
1 of 93
Bandit-based Monte-Carlo planning: the game
 of Go and beyond



Games

 Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ.
Paris-Sud, LRI, CMAP, Univ. Amsterdam, Taiwan universities (including NUTN)

TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal Network of Excellence.




Tao,
January 2010+updated 2012.
Games


Introduction

Complexity measures

Computational complexity

Partial observability

Zoology
Introduction to games

Partially or fully observable
    (“phantom” games)
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not (reputation)
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not

              (rengo)
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Introduction to games

Partially or fully observable
Randomized or not
Iterated or not
1,2,3,... players
Decentralized or not
Continuous or not
Infinite time or not
Games


Introduction

Complexity measures

Computational complexity

Partial observability

Zoology
Complexity measures
(not always well defined)


State-space complexity
Game-tree size
Decision complexity
Game-tree complexity
Computational complexity
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)

State-space complexity = number
        of possible states
Game-tree size
Decision complexity
Game-tree complexity
Computational complexity
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)


State-space complexity
Game-tree size = number of leafs
Decision complexity
Game-tree complexity
Computational complexity
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)

State-space complexity
Game-tree size
Decision complexity = min # of
leafs of tree showing perfect play
Game-tree complexity
Computational complexity
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)

State-space complexity
Game-tree size
Decision complexity
Game-tree complexity = # of leafs
for perfect play with constant depth
Computational complexity
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)

State-space complexity
Game-tree size
Decision complexity
Game-tree complexity
Computational complexity (=
complexity classes, later)
Perfect-play complexity
State of the art level
Complexity measures
(not always well defined)

State-space complexity
Game-tree size
Decision complexity
Game-tree complexity
Computational complexity
Perfect-play complexity (complexity
of perfect algorithm)
State of the art level
Complexity measures
(not always well defined)


State-space complexity
Game-tree size
Decision complexity
Game-tree complexity
Computational complexity
Perfect-play complexity
State of the art level
State of the art level


Very weak solving
Means that we know who should win
Typically proved by strategy-stealing
E.g.: hex (first player wins), hex + swap
 (second player wins)
Weak solving
Strong solving
Best results so far
State of the art
level

Very weak solving
Weak solving
Perfect play reached with reasonnable computation
time
Biggest success: draughts (tenths of years of
computation on tenths of machines)
Strong solving
Best results so far
State of the art
level

Very weak solving
Weak solving
Strong solving
       Perfect play from any situation in
       reasonable time (variants of Tic-Tac-Toe)
Best results so far
State of the art level

Very weak solving
Weak solving
Strong solving

Best results so far
Shi-Fu-Mi: humans loose
English draughts: humans + machines reach perfect
play
Chess: nobody can compete with machines
9x9 Go: MoGoTW won with the disadvantageous side
with a top player
Games

Introduction

Complexity measures

Computational

    complexity

Partial observability

Zoology
Computational complexity:
 Main reasons for this measure ?


Good feeling of understanding
                             (disagree if you want :-) )
Explicit families of problems
                             (extracted by reduction)
Fun
Connections
                  with classical complexity measures
Much better for looking clever
                  (when you speak about NP-complete
                           problems you look clever)
Computational complexity:
  Drawbacks


 Not clearly related to human/computer
comparisons

Trivial games can be very complex (this
measure if a worst case on situations that might never
occur from the start of the game - many solvings are
based on openings restricting the game)

Often based on incredibly long games
Computational complexity

Known:




Conjectured: strict inclusions everywhere.

Higher classes include undecidable cases.
Computational complexity

Given a class X, a problem q can be
in X
or harder than pbs in X (X-hard)
or both (X-complete)
or neither
                                     NP
                       NP          -difficile
               NP      -complete
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are The existence of exponential problems is known.
No. there problems solvable in
exponential time but not in polynomial
It means (roughly) “polynomial with a machine which
time ?
can run several branches simultaneously”.
Are there problems solvable in
It means (very roughly) “polynomial in linear time ?
quadratic time but not with a machine which
just has to verify a proof”.
Are there problems which can't be
  solved, even with infinite time and
Maybe P = NP is not so interesting as a question :-)
  space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
    No.
time ? are intermediate problems (if P≠ NP).
    There
Are there problems solvable in
quadratic time butNP-problems
    Yet, many important not in linear time ?
Are are eitherproblems which can't be
    there P or NP-complete.
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
              YES, infinite time and
 solved, even with YES, and YES.
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
                 YES !
 solved, even with infinite time and
       All P problems do not have
 space ? the same complexity.
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
Are there problems solvable in
exponential time but not in polynomial
time ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Complexity quizz

NP means non-polynomial ?
Assume P≠ NP. NP=NP-complete U P ?
        YES ! Undecidable problem.
Are there problems solvable in
         E.g.: time but not in polynomial
exponential
time ?     Is there a seg-fault ?
Are there problems solvable in
quadratic time but not in linear time ?
Are there problems which can't be
 solved, even with infinite time and
 space ?
Computational complexity

For evaluating the complexity of your game:

1. Generalize your game to any size
                             (non trivial for chess)
2. Consider the problem:
 - here is a board
 - is the situation a win in perfect play ?


                                           NP
                            NP
                  NP        -complete   -difficile
How to show X-completeness

The problem is in X: show that you can
solve it with resources allowed in class X.

The problem is complete: show that you
can encode a X-complete problem in your
problem.



                                          NP
                           NP
                  NP       -complete   -difficile
Computational complexity

==> cast into a decision problem (binary question)

==> can be used for choosing optimal move
                             (but not necessary)

==> trivial games can be EXPTIME-hard

==> no clear correlation with the fact that a game is difficult
    for a computer (when compared to humans)


                                                   NP
                                    NP
                        NP          -complete   -difficile
A PSPACE-complete pb: planar
 generalized geography
- A graph (oriented, planar) is given.
- Each player follows an edge (in turn).
- Repetition is not allowed.
- The first player who can't play looses.




==> A winning strategy for first player ?
Another PSPACE-complete pb:
quantified boolean formula




          True or false ?
A EXPTIME-complete pb: does a
Turing machine halts in n steps ?




- A program is given.
- A number n is given.
- Will the program halt in n time steps ?

Best solution: simulate.
Cost: n (which is exponential in log(n)!)
Games


Introduction

Complexity measures

Computational complexity

Partial observability

Zoology
Partial observability




Here discussed in the compact case
        (i.e. representation by formula)

Usually, compact ==> bigger cost
          (e.g. P ==> PSPACE)
Partial observability        (structured)
(more difficult than an opponent)
P(success)>c is undecidable
         (proba+opponent)
                           (if no time limit.)
==> analyzing P(success)=1 (no proba).
Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).
Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).
Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).
Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).
Phantom-games >>> POMDP

See Rintanen 03 (case with formulae).
Phantom-games & POMDP
with infinite horizon

Madani et al: infinite time POMDP are
   undecidable.
Auger, Teytaud: finite time deterministic
   games are undecidable.

Undecidability of phantom-Go ?
Games


Introduction

Complexity measures

Computational complexity

Partial observability

Zoology
PSPACE vs EXPTIME

==> many important games are either PSPACE or EXPTIME


Theorem: If playing = filling a location
for eternity, then it is PSPACE.
                   (not necessarily PSPACE-complete!)


Proof: Depth-first search.
Applis: Hex, Havannah, Tic-Tac-Toe,
       Atari-Go...
PSPACE vs EXPTIME
Appendix 1: the game of Go
Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9        MoGo
2008: win against a pro (4p) 19x19, H8    CrazyStone
2008: win against a pro (4p) 19x19, H7    CrazyStone
2009: win against a pro (9p) 19x19, H7        MoGo
2009: win against a pro (1p) 19x19, H6        MoGo

2007: win against a pro (5p) 9x9 (blitz)     MoGo
2008: win against a pro (5p) 9x9 white       MoGo
2009: win against a pro (5p) 9x9 black       MoGo
2009: win against a pro (9p) 9x9 white       Fuego
2009: win against a pro (9p) 9x9 black       MoGoTW

==> still 6 stones at least!
Game of Go (9x9 here)
Game of Go
Game of Go
Game of Go
Game of Go - capture
Game of Go
Game of Go
Game of Go: counting territories
(white has 7.5 “bonus” as black starts)
Game of Go: the rules
        Black plays at the blue circle: the
        white group dies (it is removed)


It's impossible to kill white (two “eyes”).




      “Ko” rules: we don't come back to the same situation.

                           (without ko: “PSPACE hard”
                           with ko: “EXPTIME-complete”)


  At the end, we count territories
  ==> black starts, so +7.5 for white.
NP / PSPACE / EXPTIME in Go
Tsumegos with no ko, forced moves only for
 W, 2 moves for B, polynomial length: NP-
 complete
Atari Go : PSPACE
Go without ko: PSPACE-hard
Go with ko + japanese rules:
                    EXPTIME-complete
Go with ko + superko: unknown (EXPSPACE?)
Some phantom-rengo undecidable ?

If Go with ko > Go without ko, then
              PSPACE EXPTIME
NP / PSPACE / EXPTIME in Go




Encoding
the formula
in a ladder:
Appendix 2: what is difficult for
computers ? Visual things ?




                 70
Easy for computers ... because
human knowledge easy to encode.




               71
Difficult for computers
Muy difícil para las ordenadores.




                  72
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
It does not work. Why ?

                                             50% of estimated
                                               win probability!


In the first node:
 The first simulations give ~ 50%
 The next simulations go to 100% or 0% (depending 
on the chosen move)
 But, then, we switch to another node 
                                               (~ 8! x 8! such nodes)
And the humans ?

                                 50% of estimated
                                   win probability!


In the first node:
 The first simulations give ~ 50%
 The next simulations go to 100% or 0% (depending 
on the chosen move)
 But, then, we DON'T switch to another node 
 
Semeais

Should
white
play in
the
semeai
(G1)
or capture
(J15) ?
             86
Semeais

Should black
play the
semeai ?




               87
Semeais

Should black
play the
semeai ?




               88
Semeais

Should black
play the
semeai ?



Useless!



               89
Difficult games: Havannah

                      Very difficult
                     for computers.
Conclusions + other
               elements
 Go complexity:
superko ?
Ishi-no-shita (captures / recaptures) ?
 (more generally: characterizing strength /weakness of programs ?)

 Huge complexity classes for
structured games
partially observable games            (what about phantom-games ?)
decentralized games

Great results for MCTS in GGP + difficult games. Next MCTS-challenges:
Partially observable cases & large horizon : cf Cazenave, Rolet
Solve main weaknesses of MCTS
           (learning the MC ? Meta-actions ? Nested MC ?
  Mixing with value-function as in amazon ?)
Biblio
Complexity: Robson, Tromp, Taylor, Crasmaru, ...
Bandits: Lai, Robbins, Auer, Cesa-Bianchi...
UCT: Kocsis, Szepesvari, Coquelin, Munos...
MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,
                                               Pérez, Rimmel, Wang...
Tree + DP for industrial applicationl: Péret, Garcia...
Bandits with infinitely many arms:
     Audibert, Coulom, Munos, Wang...
Applications far from Go: Rolet,
     Teytaud (F), Rimmel, De Mesmay
     ...
Links with “macro-actions” ?
Parallelization, mixing with offline
  learning, bias...
Paul Veyssière
                                                              Hassen Doghmen
Amine Bourki
Matthieu Coulm   Contributors                                 Colleagues from
                                                              NUTN and CJCU

 Bandits: Lai, Robbins, Auer, Cesa-Bianchi...
 UCT: Kocsis, Szepesvari, Coquelin, Munos...
 MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller,
                                                Pérez, Rimmel, Wang...
 Tree + DP for industrial applicationl: Péret, Garcia...
 Bandits with infinitely many arms:
      Audibert, Coulom, Munos, Wang...
 Applications far from Go: Rolet,
      Teytaud (F), Rimmel, De Mesmay
      ...
 Links with “macro-actions” ?
 Parallelization, mixing with offline
   learning, bias...

More Related Content

Viewers also liked

Tutorialmcts
TutorialmctsTutorialmcts
Tutorialmcts
Olivier Teytaud
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Olivier Teytaud
 
Choosing between several options in uncertain environments
Choosing between several options in uncertain environmentsChoosing between several options in uncertain environments
Choosing between several options in uncertain environments
Olivier Teytaud
 

Viewers also liked (13)

Provocative statements around energy
Provocative statements around energyProvocative statements around energy
Provocative statements around energy
 
Tutorialmcts
TutorialmctsTutorialmcts
Tutorialmcts
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
 
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Labex2012g
Labex2012gLabex2012g
Labex2012g
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Grenoble
GrenobleGrenoble
Grenoble
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
Choosing between several options in uncertain environments
Choosing between several options in uncertain environmentsChoosing between several options in uncertain environments
Choosing between several options in uncertain environments
 
Hydroelectricity
HydroelectricityHydroelectricity
Hydroelectricity
 
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
 

Similar to Theory of games

Algorithm chapter 10
Algorithm chapter 10Algorithm chapter 10
Algorithm chapter 10
chidabdu
 
lecture 27
lecture 27lecture 27
lecture 27
sajinsc
 
UNIT-V.pdf daa unit material 5 th unit ppt
UNIT-V.pdf daa unit material 5 th unit pptUNIT-V.pdf daa unit material 5 th unit ppt
UNIT-V.pdf daa unit material 5 th unit ppt
JyoReddy9
 

Similar to Theory of games (20)

Limits of Computation
Limits of ComputationLimits of Computation
Limits of Computation
 
The Limits of Computation
The Limits of ComputationThe Limits of Computation
The Limits of Computation
 
CSE680-17NP-Complete.pptx
CSE680-17NP-Complete.pptxCSE680-17NP-Complete.pptx
CSE680-17NP-Complete.pptx
 
9. chapter 8 np hard and np complete problems
9. chapter 8   np hard and np complete problems9. chapter 8   np hard and np complete problems
9. chapter 8 np hard and np complete problems
 
Teori pnp
Teori pnpTeori pnp
Teori pnp
 
AA ppt9107
AA ppt9107AA ppt9107
AA ppt9107
 
UNIT -IV DAA.pdf
UNIT  -IV DAA.pdfUNIT  -IV DAA.pdf
UNIT -IV DAA.pdf
 
Basic_concepts_NP_Hard_NP_Complete.pdf
Basic_concepts_NP_Hard_NP_Complete.pdfBasic_concepts_NP_Hard_NP_Complete.pdf
Basic_concepts_NP_Hard_NP_Complete.pdf
 
Algorithm chapter 10
Algorithm chapter 10Algorithm chapter 10
Algorithm chapter 10
 
NP completeness
NP completenessNP completeness
NP completeness
 
lecture 27
lecture 27lecture 27
lecture 27
 
Np complete
Np completeNp complete
Np complete
 
class23.ppt
class23.pptclass23.ppt
class23.ppt
 
UNIT-V.pdf daa unit material 5 th unit ppt
UNIT-V.pdf daa unit material 5 th unit pptUNIT-V.pdf daa unit material 5 th unit ppt
UNIT-V.pdf daa unit material 5 th unit ppt
 
UNIT-V.ppt
UNIT-V.pptUNIT-V.ppt
UNIT-V.ppt
 
PNP.pptx
PNP.pptxPNP.pptx
PNP.pptx
 
PNP.pptx
PNP.pptxPNP.pptx
PNP.pptx
 
A Survey Of NP-Complete Puzzles
A Survey Of NP-Complete PuzzlesA Survey Of NP-Complete Puzzles
A Survey Of NP-Complete Puzzles
 
Np complete
Np completeNp complete
Np complete
 
NP-Completeness-myppt.pptx
NP-Completeness-myppt.pptxNP-Completeness-myppt.pptx
NP-Completeness-myppt.pptx
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Theory of games

  • 1. Bandit-based Monte-Carlo planning: the game of Go and beyond Games Olivier.Teytaud@inria.fr + too many people for being all cited. Includes Inria, Cnrs, Univ. Paris-Sud, LRI, CMAP, Univ. Amsterdam, Taiwan universities (including NUTN) TAO, Inria-Saclay IDF, Cnrs 8623, Lri, Univ. Paris-Sud, Digiteo Labs, Pascal Network of Excellence. Tao, January 2010+updated 2012.
  • 3. Introduction to games Partially or fully observable (“phantom” games) Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 4. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 5. Introduction to games Partially or fully observable Randomized or not Iterated or not (reputation) 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 6. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 7. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not (rengo)
  • 8. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 9. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 10. Introduction to games Partially or fully observable Randomized or not Iterated or not 1,2,3,... players Decentralized or not Continuous or not Infinite time or not
  • 12. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity Game-tree complexity Computational complexity Perfect-play complexity State of the art level
  • 13. Complexity measures (not always well defined) State-space complexity = number of possible states Game-tree size Decision complexity Game-tree complexity Computational complexity Perfect-play complexity State of the art level
  • 14. Complexity measures (not always well defined) State-space complexity Game-tree size = number of leafs Decision complexity Game-tree complexity Computational complexity Perfect-play complexity State of the art level
  • 15. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity = min # of leafs of tree showing perfect play Game-tree complexity Computational complexity Perfect-play complexity State of the art level
  • 16. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity Game-tree complexity = # of leafs for perfect play with constant depth Computational complexity Perfect-play complexity State of the art level
  • 17. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity Game-tree complexity Computational complexity (= complexity classes, later) Perfect-play complexity State of the art level
  • 18. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity Game-tree complexity Computational complexity Perfect-play complexity (complexity of perfect algorithm) State of the art level
  • 19. Complexity measures (not always well defined) State-space complexity Game-tree size Decision complexity Game-tree complexity Computational complexity Perfect-play complexity State of the art level
  • 20. State of the art level Very weak solving Means that we know who should win Typically proved by strategy-stealing E.g.: hex (first player wins), hex + swap (second player wins) Weak solving Strong solving Best results so far
  • 21. State of the art level Very weak solving Weak solving Perfect play reached with reasonnable computation time Biggest success: draughts (tenths of years of computation on tenths of machines) Strong solving Best results so far
  • 22. State of the art level Very weak solving Weak solving Strong solving Perfect play from any situation in reasonable time (variants of Tic-Tac-Toe) Best results so far
  • 23. State of the art level Very weak solving Weak solving Strong solving Best results so far Shi-Fu-Mi: humans loose English draughts: humans + machines reach perfect play Chess: nobody can compete with machines 9x9 Go: MoGoTW won with the disadvantageous side with a top player
  • 24. Games Introduction Complexity measures Computational complexity Partial observability Zoology
  • 25. Computational complexity: Main reasons for this measure ? Good feeling of understanding (disagree if you want :-) ) Explicit families of problems (extracted by reduction) Fun Connections with classical complexity measures Much better for looking clever (when you speak about NP-complete problems you look clever)
  • 26. Computational complexity: Drawbacks Not clearly related to human/computer comparisons Trivial games can be very complex (this measure if a worst case on situations that might never occur from the start of the game - many solvings are based on openings restricting the game) Often based on incredibly long games
  • 27. Computational complexity Known: Conjectured: strict inclusions everywhere. Higher classes include undecidable cases.
  • 28. Computational complexity Given a class X, a problem q can be in X or harder than pbs in X (X-hard) or both (X-complete) or neither NP NP -difficile NP -complete
  • 29. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 30. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are The existence of exponential problems is known. No. there problems solvable in exponential time but not in polynomial It means (roughly) “polynomial with a machine which time ? can run several branches simultaneously”. Are there problems solvable in It means (very roughly) “polynomial in linear time ? quadratic time but not with a machine which just has to verify a proof”. Are there problems which can't be solved, even with infinite time and Maybe P = NP is not so interesting as a question :-) space ?
  • 31. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 32. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial No. time ? are intermediate problems (if P≠ NP). There Are there problems solvable in quadratic time butNP-problems Yet, many important not in linear time ? Are are eitherproblems which can't be there P or NP-complete. solved, even with infinite time and space ?
  • 33. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 34. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be YES, infinite time and solved, even with YES, and YES. space ?
  • 35. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 36. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be YES ! solved, even with infinite time and All P problems do not have space ? the same complexity.
  • 37. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? Are there problems solvable in exponential time but not in polynomial time ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 38. Complexity quizz NP means non-polynomial ? Assume P≠ NP. NP=NP-complete U P ? YES ! Undecidable problem. Are there problems solvable in E.g.: time but not in polynomial exponential time ? Is there a seg-fault ? Are there problems solvable in quadratic time but not in linear time ? Are there problems which can't be solved, even with infinite time and space ?
  • 39. Computational complexity For evaluating the complexity of your game: 1. Generalize your game to any size (non trivial for chess) 2. Consider the problem: - here is a board - is the situation a win in perfect play ? NP NP NP -complete -difficile
  • 40. How to show X-completeness The problem is in X: show that you can solve it with resources allowed in class X. The problem is complete: show that you can encode a X-complete problem in your problem. NP NP NP -complete -difficile
  • 41. Computational complexity ==> cast into a decision problem (binary question) ==> can be used for choosing optimal move (but not necessary) ==> trivial games can be EXPTIME-hard ==> no clear correlation with the fact that a game is difficult for a computer (when compared to humans) NP NP NP -complete -difficile
  • 42. A PSPACE-complete pb: planar generalized geography - A graph (oriented, planar) is given. - Each player follows an edge (in turn). - Repetition is not allowed. - The first player who can't play looses. ==> A winning strategy for first player ?
  • 43. Another PSPACE-complete pb: quantified boolean formula True or false ?
  • 44. A EXPTIME-complete pb: does a Turing machine halts in n steps ? - A program is given. - A number n is given. - Will the program halt in n time steps ? Best solution: simulate. Cost: n (which is exponential in log(n)!)
  • 46. Partial observability Here discussed in the compact case (i.e. representation by formula) Usually, compact ==> bigger cost (e.g. P ==> PSPACE)
  • 47. Partial observability (structured) (more difficult than an opponent) P(success)>c is undecidable (proba+opponent) (if no time limit.) ==> analyzing P(success)=1 (no proba).
  • 48. Phantom-games >>> POMDP See Rintanen 03 (case with formulae).
  • 49. Phantom-games >>> POMDP See Rintanen 03 (case with formulae).
  • 50. Phantom-games >>> POMDP See Rintanen 03 (case with formulae).
  • 51. Phantom-games >>> POMDP See Rintanen 03 (case with formulae).
  • 52. Phantom-games >>> POMDP See Rintanen 03 (case with formulae).
  • 53. Phantom-games & POMDP with infinite horizon Madani et al: infinite time POMDP are undecidable. Auger, Teytaud: finite time deterministic games are undecidable. Undecidability of phantom-Go ?
  • 55. PSPACE vs EXPTIME ==> many important games are either PSPACE or EXPTIME Theorem: If playing = filling a location for eternity, then it is PSPACE. (not necessarily PSPACE-complete!) Proof: Depth-first search. Applis: Hex, Havannah, Tic-Tac-Toe, Atari-Go...
  • 57. Appendix 1: the game of Go
  • 58. Go: from 29 to 6 stones 1998: loss against amateur (6d) 19x19 H29 2008: win against a pro (8p) 19x19, H9 MoGo 2008: win against a pro (4p) 19x19, H8 CrazyStone 2008: win against a pro (4p) 19x19, H7 CrazyStone 2009: win against a pro (9p) 19x19, H7 MoGo 2009: win against a pro (1p) 19x19, H6 MoGo 2007: win against a pro (5p) 9x9 (blitz) MoGo 2008: win against a pro (5p) 9x9 white MoGo 2009: win against a pro (5p) 9x9 black MoGo 2009: win against a pro (9p) 9x9 white Fuego 2009: win against a pro (9p) 9x9 black MoGoTW ==> still 6 stones at least!
  • 59. Game of Go (9x9 here)
  • 63. Game of Go - capture
  • 66. Game of Go: counting territories (white has 7.5 “bonus” as black starts)
  • 67. Game of Go: the rules Black plays at the blue circle: the white group dies (it is removed) It's impossible to kill white (two “eyes”). “Ko” rules: we don't come back to the same situation. (without ko: “PSPACE hard” with ko: “EXPTIME-complete”) At the end, we count territories ==> black starts, so +7.5 for white.
  • 68. NP / PSPACE / EXPTIME in Go Tsumegos with no ko, forced moves only for W, 2 moves for B, polynomial length: NP- complete Atari Go : PSPACE Go without ko: PSPACE-hard Go with ko + japanese rules: EXPTIME-complete Go with ko + superko: unknown (EXPSPACE?) Some phantom-rengo undecidable ? If Go with ko > Go without ko, then PSPACE EXPTIME
  • 69. NP / PSPACE / EXPTIME in Go Encoding the formula in a ladder:
  • 70. Appendix 2: what is difficult for computers ? Visual things ? 70
  • 71. Easy for computers ... because human knowledge easy to encode. 71
  • 72. Difficult for computers Muy difícil para las ordenadores. 72
  • 73. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 74. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 75. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 76. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 77. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 78. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 79. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 80. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 81. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 82. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 83. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 84. It does not work. Why ? 50% of estimated win probability! In the first node:  The first simulations give ~ 50%  The next simulations go to 100% or 0% (depending  on the chosen move)  But, then, we switch to another node                                                 (~ 8! x 8! such nodes)
  • 85. And the humans ? 50% of estimated win probability! In the first node:  The first simulations give ~ 50%  The next simulations go to 100% or 0% (depending  on the chosen move)  But, then, we DON'T switch to another node   
  • 90. Difficult games: Havannah Very difficult for computers.
  • 91. Conclusions + other elements Go complexity: superko ? Ishi-no-shita (captures / recaptures) ? (more generally: characterizing strength /weakness of programs ?) Huge complexity classes for structured games partially observable games (what about phantom-games ?) decentralized games Great results for MCTS in GGP + difficult games. Next MCTS-challenges: Partially observable cases & large horizon : cf Cazenave, Rolet Solve main weaknesses of MCTS (learning the MC ? Meta-actions ? Nested MC ? Mixing with value-function as in amazon ?)
  • 92. Biblio Complexity: Robson, Tromp, Taylor, Crasmaru, ... Bandits: Lai, Robbins, Auer, Cesa-Bianchi... UCT: Kocsis, Szepesvari, Coquelin, Munos... MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller, Pérez, Rimmel, Wang... Tree + DP for industrial applicationl: Péret, Garcia... Bandits with infinitely many arms: Audibert, Coulom, Munos, Wang... Applications far from Go: Rolet, Teytaud (F), Rimmel, De Mesmay ... Links with “macro-actions” ? Parallelization, mixing with offline learning, bias...
  • 93. Paul Veyssière Hassen Doghmen Amine Bourki Matthieu Coulm Contributors Colleagues from NUTN and CJCU Bandits: Lai, Robbins, Auer, Cesa-Bianchi... UCT: Kocsis, Szepesvari, Coquelin, Munos... MCTS (Go): Coulom, Chaslot, Fiter, Gelly, Hoock, Silver, Muller, Pérez, Rimmel, Wang... Tree + DP for industrial applicationl: Péret, Garcia... Bandits with infinitely many arms: Audibert, Coulom, Munos, Wang... Applications far from Go: Rolet, Teytaud (F), Rimmel, De Mesmay ... Links with “macro-actions” ? Parallelization, mixing with offline learning, bias...