Meta Monte-Carlo Tree Search

•Download as ODP, PDF•

0 likes•353 views

This document discusses using Meta-Monte Carlo Tree Search (Meta-MCTS) to build an opening book for 7x7 Go. Meta-MCTS improved its play against a sparring partner that incorporated human variations. While Meta-MCTS won all games as black and white against professionals, humans found at least one variation where it did not play correctly. The document concludes that Meta-MCTS performed well but incorporating human data helped, and exactly solving 7x7 Go would require immense work collecting and solving all leaf variations.

Business Technology

- Go is not solved beyond 6x6
- We build an opening book for 7x7 for
approximately solving 7x7 Go..
Widely believed: perfect play is a draw with Komi 9.

Our tools

1) Monte-Carlo Tree Search
2) Meta-Monte-Carlo Tree Search
3) Senseis' partial solution
(using in particular Davies' work)

Monte-Carlo Tree Search

Coulom 2006,
Kocsis-Szepesvari 2006.

= combining tree search
and Monte-Carlo evaluation

UCT (Upper Confidence Trees)

Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)

Exploitation ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )

... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )

Meta-Monte-Carlo
Tree Search

= Monte-Carlo Tree Search
with Monte-Carlo replaced by
MCTS

Meta-Monte-Carlo
Tree Search
I.e.:
MCTS
= MC play-outs + Tree Search
Meta-MCTS
= MCTS play-outs + Tree Search
Meta-Meta-MCTS
= Meta-MCTS play-outs + TreeSearch
...

A variation which is not in Senseis' file.
Left: black E3 should be black E5.
Right: corrected version.

Meta-MCTS learns against a
MCTS sparring partner.
We introduce Senseis' variations
into this sparring partner
during the Meta-MCTS run.

Learning curve of black by Meta-MCTS:
- X-axis = log2(number of playouts)
- Y-axis = moving average (window size 55)
of winning rate in playouts
Playouts = MCTS (it's Meta-MCTS)
Komi = 8.5 (winning with komi 8.5
ensures a draw with komi 9)

Learning curve of white by Meta-MCTS:
- X-axis = log2(number of playouts)
- Y-axis = moving average (window size 55)
of winning rate in playouts
Playouts = MCTS (it's Meta-MCTS)
Komi = 9.5 (winning with komi 9.5
ensures a draw with komi 9)

Decreasing points in the curve = introduction
of Senseis variations in the opponent.
Conclusion = the algorithm did not find alone
all these variations ==> human needed.

With komi 9.5, MoGoTW won everything as White.
With komi 8.5, MoGoTW won everything as Black.

Exciting!
Were all MoGoTW's moves perfect ?

With komi 9.5, MoGoTW won everything as White.
With komi 8.5, MoGoTW won everything as Black.

Exciting!
Were all MoGoTW's moves perfect ?

No :-(

In one game (at least) the human might
have won.

Left: this game was won by MoGoTW as black.
Chun-Yen Lin (2P) made a mistake.
Right: how Chun-Yen Lin (2P) might have won the game.

So, still at least one variation on
which the bot does not play correctly.

We did not introduce manually a correction,
but we introduced the variation played by the pro
in the sparring partner.

We see if the bot can find a solution by itself.

Learning curve as black, after introducing the

dangerous variation in the sparring partner.

Time still logarithmic.

CONCLUSIONS

We used Meta-MCTS for building
an opening book for 7x7 Go.

We are not aware of
remaining bad moves, which does
not mean there's no more bad move.

Meta-MCTS did a good job by itself,
but human inputs
( = Senseis + games against
pros) have been helpful.

Towards exact solving ?
= collecting all leafs of the OB
+ solving all of them...
= huge work.

Other conclusions:

- 7x7 can be very hard, even for pros
(pros made mistakes).

- MCTS alone is not enough for
very strong play in 7x7

Viewers also liked

Tools for artificial intelligenceOlivier Teytaud

Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud

Energy Management Forum, Tainan 2012Olivier Teytaud

3slidesOlivier Teytaud

Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud

Stochastic modelling and quasi-random numbersOlivier Teytaud

Multimodal or Expensive OptimizationOlivier Teytaud

Noisy optimization --- (theory oriented) SurveyOlivier Teytaud

Viewers also liked (8)

Tools for artificial intelligence

Dynamic Optimization without Markov Assumptions: application to power systems

Energy Management Forum, Tainan 2012

3slides

Machine learning 2016: deep networks and Monte Carlo Tree Search

Stochastic modelling and quasi-random numbers

Multimodal or Expensive Optimization

Noisy optimization --- (theory oriented) Survey

Similar to Meta Monte-Carlo Tree Search

Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud

AlphaZero and beyond: PolygamesOlivier Teytaud

Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud

Tic tac toe simple ai gameSeevaratnam Kajandan

EDUC110 - ASSESSESMENT IN LEARNING 2.pptxlovelyjoyluchavez

Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud

Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud

A sample Lab report on a game. Junayed Ahmed

G4ww1 4tuananh1010

Matlab programmingMd. Rayid Hasan Mojumder

Artificial IntelligenceAltafur Rahman

Topic - 6 (Game Playing).pptSabrinaShanta2

5.5 back trackKrish_ver2

tic-tac-toe: Game playingkalpana Manudhane

Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin

Ch07 linearspacealignmentBioinformaticsInstitute

Silverdisappointing8 120924091642-phpapp01David Robles

Disappointing results & open problems in Monte-Carlo Tree SearchOlivier Teytaud

Combining games artificial intelligences & improving random seedsOlivier Teytaud

Understanding Basics of Machine LearningPranav Ainavolu

Similar to Meta Monte-Carlo Tree Search (20)

Monte Carlo Tree Search in 2014 (MCMC days in Marseille)

AlphaZero and beyond: Polygames

Simulation-based optimization: Upper Confidence Tree and Direct Policy Search

Tic tac toe simple ai game

EDUC110 - ASSESSESMENT IN LEARNING 2.pptx

Simple regret bandit algorithms for unstructured noisy optimization

Machine learning 2016: deep networks and Monte Carlo Tree Search

A sample Lab report on a game.

G4ww1 4

Matlab programming

Artificial Intelligence

Topic - 6 (Game Playing).ppt

5.5 back track

tic-tac-toe: Game playing

Monte Carlo Tree Search for the Super Mario Bros

Ch07 linearspacealignment

Silverdisappointing8 120924091642-phpapp01

Disappointing results & open problems in Monte-Carlo Tree Search

Combining games artificial intelligences & improving random seeds

Understanding Basics of Machine Learning

Recently uploaded

Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla

Insurers' journeys to build a mastery in the IoT usageMatteo Carbone

VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account

Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller

Progress Report - Oracle Database Analyst SummitHolger Mueller

0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16

Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9

Pharma Works Profile of Karan Communicationskarancommunications

Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen

Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine

It will be International Nurses' Day on 12 MayNZSG

7.pdf This presentation captures many uses and the significance of the number...Paul Menig

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo

Monthly Social Media Update April 2024 pptx.pptxAndy Lambert

GD Birla and his contribution in managementchhavia330

Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation

Event mailer assignment progress report .pdftbatkhuu1

Recently uploaded (20)

Regression analysis: Simple Linear Regression Multiple Linear Regression

Insurers' journeys to build a mastery in the IoT usage

VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...

KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...

Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...

Progress Report - Oracle Database Analyst Summit

0183760ssssssssssssssssssssssssssss00101011 (27).pdf

Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...

Pharma Works Profile of Karan Communications

Boost the utilization of your HCL environment by reevaluating use cases and f...

Unlocking the Secrets of Affiliate Marketing.pdf

It will be International Nurses' Day on 12 May

7.pdf This presentation captures many uses and the significance of the number...

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL

Monthly Social Media Update April 2024 pptx.pptx

GD Birla and his contribution in management

Keppel Ltd. 1Q 2024 Business Update Presentation Slides

Event mailer assignment progress report .pdf

Meta Monte-Carlo Tree Search

2. - Go is not solved beyond 6x6 - We build an opening book for 7x7 for approximately solving 7x7 Go.. Widely believed: perfect play is a draw with Komi 9.

3. Our tools 1) Monte-Carlo Tree Search 2) Meta-Monte-Carlo Tree Search 3) Senseis' partial solution (using in particular Davies' work)

4. Monte-Carlo Tree Search Coulom 2006, Kocsis-Szepesvari 2006. = combining tree search and Monte-Carlo evaluation

5. UCT (Upper Confidence Trees) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)

6. UCT

7. UCT

8. UCT

9. UCT

10. UCT Kocsis & Szepesvari (06)

11. Exploitation ...

12. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )

13. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )

14. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )

15. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )

16. Our tools 1) Monte-Carlo Tree Search 2) Meta-Monte-Carlo Tree Search 3) Senseis' partial solution (using in particular Davies' work)

17. Meta-Monte-Carlo Tree Search = Monte-Carlo Tree Search with Monte-Carlo replaced by MCTS

18. Meta-Monte-Carlo Tree Search I.e.: MCTS = MC play-outs + Tree Search Meta-MCTS = MCTS play-outs + Tree Search Meta-Meta-MCTS = Meta-MCTS play-outs + TreeSearch ...

19. Meta-Monte-Carlo Tree Search I.e.: MCTS = MC play-outs + Tree Search Meta-MCTS = MCTS play-outs + Tree Search Meta-Meta-MCTS = Meta-MCTS play-outs + TreeSearch ...

20. Our tools 1) Monte-Carlo Tree Search 2) Meta-Monte-Carlo Tree Search 3) Senseis' partial solution (using in particular Davies' work)

21. A variation which is not in Senseis' file. Left: black E3 should be black E5. Right: corrected version.

22. EXPERIMENTAL RESULTS

23. Meta-MCTS learns against a MCTS sparring partner. We introduce Senseis' variations into this sparring partner during the Meta-MCTS run.

24. Learning curve of black by Meta-MCTS: - X-axis = log2(number of playouts) - Y-axis = moving average (window size 55) of winning rate in playouts Playouts = MCTS (it's Meta-MCTS) Komi = 8.5 (winning with komi 8.5 ensures a draw with komi 9)

25. Learning curve of white by Meta-MCTS: - X-axis = log2(number of playouts) - Y-axis = moving average (window size 55) of winning rate in playouts Playouts = MCTS (it's Meta-MCTS) Komi = 9.5 (winning with komi 9.5 ensures a draw with komi 9)

26. Decreasing points in the curve = introduction of Senseis variations in the opponent. Conclusion = the algorithm did not find alone all these variations ==> human needed.

27. Games Against pros.

28. MoGoTW is black.

29. MoGoTW is white

30. With komi 9.5, MoGoTW won everything as White. With komi 8.5, MoGoTW won everything as Black. Exciting! Were all MoGoTW's moves perfect ?

31. With komi 9.5, MoGoTW won everything as White. With komi 8.5, MoGoTW won everything as Black. Exciting! Were all MoGoTW's moves perfect ? No :-( In one game (at least) the human might have won.

32. Left: this game was won by MoGoTW as black. Chun-Yen Lin (2P) made a mistake. Right: how Chun-Yen Lin (2P) might have won the game.

33. So, still at least one variation on which the bot does not play correctly. We did not introduce manually a correction, but we introduced the variation played by the pro in the sparring partner. We see if the bot can find a solution by itself.

34. Learning curve as black, after introducing the dangerous variation in the sparring partner. Time still logarithmic.

35. CONCLUSIONS We used Meta-MCTS for building an opening book for 7x7 Go. We are not aware of remaining bad moves, which does not mean there's no more bad move.

36. Meta-MCTS did a good job by itself, but human inputs ( = Senseis + games against pros) have been helpful. Towards exact solving ? = collecting all leafs of the OB + solving all of them... = huge work.

37. Other conclusions: - 7x7 can be very hard, even for pros (pros made mistakes). - MCTS alone is not enough for very strong play in 7x7

Meta Monte-Carlo Tree Search

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Meta Monte-Carlo Tree Search

Similar to Meta Monte-Carlo Tree Search (20)

Recently uploaded

Recently uploaded (20)

Meta Monte-Carlo Tree Search