Combining games artificial intelligences & improving random seedsOlivier Teytaud
Combining algorithms might be more important than im-
proving, by a few percents, the performance of algorithms by making them more and more specic. As a consequence, many works have been devoted recently to portfolios of algorithms, i.e. the art of combining existing algorithms and selecting the relevant ones. Portfolios of algorithms
are classical in optimization and machine learning; this paper focuses on portfolios of policies. We distinguish:
{ Nash-Portfolio: cases in which we learn a portfolio-combination offline, based on a portfolio for each player (applicable for adversarial problems);
{ Bandit-Portfolio: cases in which we learn a portfolio-combination
online, against a xed opponent (applicable for adversarial problems with a xed opponent or for stochastic problems).
We apply this methodology for learning Go articial intelligences. The advantages are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the oppo-
nent) (iii) simplicity (iv) increased performance. In particular, we will see that we can \bootstrap" the random seeds.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
@inproceedings{buffet:hal-00750577,
hal_id = {hal-00750577},
url = {http://hal.inria.fr/hal-00750577},
title = {{Optimistic Heuristics for MineSweeper}},
author = {Buffet, Olivier and Lee, Chang-Shing and Lin, Woanting and Teytaud, Olivier},
abstract = {{We present a combination of Upper Con dence Tree (UCT) and domain speci c solvers, aimed at improving the behavior of UCT for long term aspects of a problem. Results improve the state of the art, combining top performance on small boards (where UCT is the state of the art) and on big boards (where variants of CSP rule).}},
language = {Anglais},
affiliation = {MAIA - INRIA Nancy - Grand Est / LORIA , Department of Computer Science and Information Engineering - CSIE , National University of Tainan - NUTN , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , Department of Electrical Engineering and Computer Science - Institut Montefiore},
booktitle = {{International Computer Symposium}},
address = {Hualien, Ta{\"\i}wan, Province De Chine},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00750577/PDF/mines3.pdf},
}
Combining games artificial intelligences & improving random seedsOlivier Teytaud
Combining algorithms might be more important than im-
proving, by a few percents, the performance of algorithms by making them more and more specic. As a consequence, many works have been devoted recently to portfolios of algorithms, i.e. the art of combining existing algorithms and selecting the relevant ones. Portfolios of algorithms
are classical in optimization and machine learning; this paper focuses on portfolios of policies. We distinguish:
{ Nash-Portfolio: cases in which we learn a portfolio-combination offline, based on a portfolio for each player (applicable for adversarial problems);
{ Bandit-Portfolio: cases in which we learn a portfolio-combination
online, against a xed opponent (applicable for adversarial problems with a xed opponent or for stochastic problems).
We apply this methodology for learning Go articial intelligences. The advantages are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the oppo-
nent) (iii) simplicity (iv) increased performance. In particular, we will see that we can \bootstrap" the random seeds.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
@inproceedings{buffet:hal-00750577,
hal_id = {hal-00750577},
url = {http://hal.inria.fr/hal-00750577},
title = {{Optimistic Heuristics for MineSweeper}},
author = {Buffet, Olivier and Lee, Chang-Shing and Lin, Woanting and Teytaud, Olivier},
abstract = {{We present a combination of Upper Con dence Tree (UCT) and domain speci c solvers, aimed at improving the behavior of UCT for long term aspects of a problem. Results improve the state of the art, combining top performance on small boards (where UCT is the state of the art) and on big boards (where variants of CSP rule).}},
language = {Anglais},
affiliation = {MAIA - INRIA Nancy - Grand Est / LORIA , Department of Computer Science and Information Engineering - CSIE , National University of Tainan - NUTN , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , Department of Electrical Engineering and Computer Science - Institut Montefiore},
booktitle = {{International Computer Symposium}},
address = {Hualien, Ta{\"\i}wan, Province De Chine},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00750577/PDF/mines3.pdf},
}
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationUmberto Picchini
Presented on 7 May 2020 at "One World Approximate Bayesian Computation (ABC) Seminar". A video is available at https://youtu.be/IOPnRfAJ_W8
Approximate Bayesian computation (ABC) is computationally intensive for complex model simulators. To exploit expensive simulations, data-resampling via bootstrapping was used with success in [1] to obtain many artificial datasets at little cost and construct a synthetic likelihood. When using the same approach within ABC to produce a pseudo-marginal ABC-MCMC algorithm, the posterior variance is inflated, thus producing biased posterior inference. Here we use stratified Monte Carlo to considerably reduce the bias induced by data resampling. We also show that it is possible to obtain reliable inference using a larger than usual ABC threshold, by employing stratified Monte Carlo. Finally, we show that with stratified sampling we obtain a less variable ABC likelihood. In our paper [2] we consider simulation studies for static (Gaussian, g-and-k distribution, Ising model) and dynamic models (Lotka-Volterra). For the Lotka-Volterra case study, we compare our results against a standard pseudo-Marginal ABC and find that our approach is four times more efficient and, given limited computational budget, it explores the posterior surface more thoroughly. A comparison against state-of-art sequential Monte Carlo ABC is also reported.
References
[1] R. G. Everitt (2017). Bootstrapped synthetic likelihood. arXiv:1711.05825.
[2] U. Picchini, R.G. Everitt (2019). Stratified sampling and resampling for approximateBayesian computation. arXiv:1905.07976
Or: From ice to R-matrices
This talk is a summary of the history of quantum groups, describe how they arose from questions in statistical mechanics. The keyword is the Yang-Baxter equation, which was crucial for the development of the field.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
We all make choices between alternatives every day in many contexts - not just transport. There is theory to help planners forecast those decisions, but it is generally poorly understood. The aim of this presentation is to be of particular relevance to all PhD students and early career researchers - who should know something about DCM even if not planning to work in that area. No prior knowledge necessary.
Tony Fowkes first joined ITS in September 1976, coming from the University's School of Economic Studies, where he had been lecturing. Initially he worked on Car Ownership Forecasting, before working in a wide variety of areas of Transport Planning. In 1982 he joined the first UK Value of Time study, as well as a parallel project on Business Travel which led to pioneering work on Business Value of Time. On both those projects he helped to develop the new technique of Stated Preference estimation. In 1984 he began 4 years here as British Railways Senior Rail Research Fellow. He then moved to a mix of teaching and research, jointly with LUBS. He has published widely and contributed to many influential reports for government bodies. He retired in October 2016 as Reader in Transport Econometrics, and is now a Visiting Reader at ITS.
Nearest neighbor models are conceptually just about the simplest kind of model possible. The problem is that they generally aren’t feasible to apply. Or at least, they weren’t feasible until the advent of Big Data techniques. These slides will describe some of the techniques used in the knn project to reduce thousand-year computations to a few hours. The knn project uses the Mahout math library and Hadoop to speed up these enormous computations to the point that they can be usefully applied to real problems. These same techniques can also be used to do real-time model scoring.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
1. Monte-Carlo Tree Search
O. Teytaud & colleagues
ENSL / ski 2014
In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- UCT & other maths
- unsolved stuff
2. Monte-Carlo Tree Search
O. Teytaud & colleagues
ENSL / Ski 2014
In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- UCT & other maths
- unsolved stuff If someone solves these problems,
it justifies a whole life of
academic salary :-)
9. Part I. A success story
in Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Bandits, UCT & other math. stuff
Part IV. Conclusion
10. Part I : The Success Story
(less showing off in part II :-) )
The game of Go is a beautiful
Challenge.
11. Part I : The Success Story
(less showing off in part II :-) )
The game of Go is a beautiful
challenge.
We did the first wins against
professional players
in the game of Go
But with handicap!
19. Game of Go: counting territories
( w h i t e h a s 7 . 5 “ b o n u s ” a s b l a c k s t a r t s )
20. Game of Go: the rules
Black plays at the blue circle:
the white group dies (it is
removed)
It's impossible to kill white (two “eyes”).
“Superko” rule: we don't come back to the same
situation.
(without superko: “PSPACE hard”
with superko: “EXPTIME-hard”)
At the end, we count territories
==> black starts, so +7.5 for white.
34. Summary of MCTS
• While ( we have time)
– S = state at which we need a decision
– Simulate randomly from S until end
– Update statistics
• Decision = most simulated in S
Using UCB
35. UCB and its variants
• We have seen the MCTS principle
• The most classical MCTS is UCT (i.e.
MCTS with UCB)
• Let us see the UCB formula and its
properties
36. Upper Confidence Bound
Problem specified by:
- K arms
- Probability distribution R1,...,RK
- A budget T (# time steps)
During T time steps t=1,...,t=T, ( t=T+1 ):
- we choose at
in {1,...,K}
- we get a reward rt
indep. drawn with distrib. Rat
We minimize a regret:
- Cumulative regret R = T maxi
E Ri
-
- Simple regret maxi
E Ri
– Ra(T+1)
UCB: at
=argmin averageReward(a) + sqrt( C log(t) / nb(a) )
==> reasonably good both for Simple & Cumulative
37. Stochastic bandit
Two main assumptions:
● Stationary
● Cumulative regret
Not true in
MCTS
Average reward for arm k variance for arm k
38. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT ( +sqrt(log t / nbSims) )
39. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
40. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context.
41. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context (indep, regret).
• Contrarily to what has often been claimed, UCB is
not central in MCTS (but ok for proving
convergence).
42. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context.
• Contrarily to what has often been claimed, UCB is
not central in MCTS (ok for proving convergence).
• But for publishing papers, relating MCTS to UCB is
so beautiful, with plenty of maths papers in the
bibliography :-)
43. Non stationary case
• Kocsis + Szepesvari 2006: UCB in non-
stationary case
• Application to UCT:
44. Non stationary: Uct
• Kocsis + Szepesvari 2006: UCB in non-
stationary case
• Application to UCT:
Huge problem-dependent
constant.
Only for finite MDP
B(D/2)
iterations
(Branching & Depth)Experiments (~ αβ)
45. Now variants
• ((( f(x) = noisy function, finding x such
that E f(x) is minimum for x in [0,1]d
)))
• Problem with infinite action space / state
space
• And algorithms which work better than
UCT in the discrete case
46. Infinite action space
• E.g. actions are continuous
• Infinite branching factor
• UCB meaningless in such a case
==> progressive widening: argmax
UCBscore over n0.2
first options
47. Infinite MDP
• Variant of UCT (Auger et al, 2013)
• Progressive widening: consider only a
sublinear number of children nodes
• Exploration log(t) ==> te
for some e>0
Error = O ( 1/n10D
)
exponentially surely in n
Explicit rate, but it
will take time...
49. Binary rewards, without exploration
(Berthier et al, 2009)
UcbScore(move) =
meanReward(move)
+ sqrt( log(t) / nbSims(move) )
mean = (numerator+K) / (denominator + 2K)
50. Adversarial bandit
Different framework:
the reward is M(k,k') where
k' is chosen by an adversary
(not aware or your choice).
Criteria are a bit different,
algorithms are stochastic.
==> not for today.
==> extends UCT to
simultaneous actions
51. The great news about the MCTS field:
● Not related to classical algorithms
(no alpha-beta)
● Recent tools
(Rémi Coulom's paper in 2006)
● Not at all specific from Go
(now widely used in games,
and beyond)
52. The great news:
● Not related to classical algorithms
(no alpha-beta)
● Recent tools
(Rémi Coulom's paper in 2006)
● Not at all specific from Go
(now widely used in games,
and beyond)
But great performance in Go
needs adaptations
(of the MC part)...
53. Part II: challenges
Two main challenges:
● Situations which require abstract thinking
(cf. Cazenave)
● Situations which involve divide & conquer
(cf Müller)
54. Part I. A success story on Computer Games
Part II. Two unsolved problems in
Computer Games
Part III. Some algorithms which do not solve them
Part IV. Conclusion
70. Requires more than local fighting.
Requires combining several local fights.
Children usually
not so good
at this.
But strong adults
really good.
And computers
very childish.
Looks like a
bad move,
“locally”.
Lee Sedol (black)
Vs
Hang Jansik (white)
71. Requires more than local fighting.
Requires combining several local fights.
Children usually
not so good
at this.
But strong adults
really good.
And computers
very childish.
Looks like a
bad move,
“locally”.
Alive!
72. Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Some algorithms which
do not solve them
(negatives results show that importance stuff is
really on II...)
Part IV. Conclusion
73. Part III: techniques for addressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
74. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
– Select nodes with at least 5% of total sims (depth at
most 3)
– Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
75. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
– Select nodes with at least 5% of total sims (depth at
most 3)
– Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
76. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
– Select nodes with at least 5% of total sims (depth at
most 3)
– Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
77. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
– Select nodes with at least 5% of total sims (depth at
most 3)
– Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
78. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
– Select nodes with at least 5% of total sims (depth at
most 3)
– Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
85. More deeply, 1
(R. Coulom)
Improvement in terms of performance against
humans
<<
Improvement in terms of performance against
computers
<<
Improvements in terms of self-play
86. More deeply, 2
No improvement in divide and conquer.
No improvement on situations
which require abstraction.
87. Part III: techniques for adressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
88. What is machine learning ?
= using plenty of data
for deriving useful knowledge.
89. So it's statistics ?
Closely related to statistics
Just a bit more “geek”.
90. Machine learning
Good simulations are crucial.
It is a bit disappointing for the
genericity of the method.
Can we make this
tuning automatic ?
92. Rapid Action Value Estimates
ScoreUCB(m,s) = average reward when
playing move m in situation ((( s + sqrt(...) )))
ScoreRAVE(m,s) = average reward when
playing move m after situation s
==> asymptotically stupid (we want an estimate
of m when it is played now, in s)
==> but non-asymptotically quite great
93. A classical machine learning trick in MCTS: RAVE
(= rapid action value estimates)
score(move) =
alpha UCB(move)
+ (1-alpha) RAVE(move)
Alpha2
= nbSimulations / ( K + nbSimulations)
Usually works well, but performs weakly on some situations.
weakness:
- brings information only from bottom to top of the tree
- does not solve main problems
- sometimes very harmful
==> extensions ?
94. A classical machine learning trick in MCTS: RAVE
(= rapid action value estimates)
score(move,s) =
alpha UCB(move,s)
+ (1-alpha) RAVE(move,s)
Alpha2
= nbSimulations / ( K + nbSimulations)
Or better:
● RAVE(m,s) = #cumRewardRAVE(m,s) / #simsRAVE(m,s)
● #simsRAVE(m,s) initialized at 50
● #cumRewardRAVE(m,s) initialized at 50 x expertise(m,s)
Currently, “expertise” is handcrafted.
Can we do better with a neural network ?
95. Here B2 is the only good move for white.
But B2 makes sense only as a first move,
and nowhere else in subtrees ==> RAVE rejects B2.
==> extensions ?
97. Criticality: how to use it ?
SimsCriticality = c x | Criticality |
● WinsCriticality= SimsCriticality if Criticality >0
● WinsCriticality= 0 otherwise
==> Then, use WinsRAVE + WinsCriticality
and SimsRAVE + SimsCriticality
99. Other Machine Learning tricks in MCTS
4 generic rules proposed recently:
- Drake [ICGA 2009]: Last Good Reply
- Silver and others: simulation balancing
- poolRave [Rimmel et al, ACG 2011]
- Contextual Monte-Carlo [Rimmel et al, E.G. 2010]
- Decisive moves and anti-decisive moves
[Teytaud et al, CIG 2010]
==> significantly positive, but far less
efficient than human expertise
100. Part III: techniques for adressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
101. We don't want to use expert knowledge.
We want automated solutions.
Developing biases by Genetic Programming ?
Genetic programming
= optimizing programs.
E.g. optimizing the
Monte Carlo simulator.
Typically by evolutionary
algorithms.
102. We don't want to use expert knowledge.
We want automated solutions.
Developing biases by Genetic Programming ?
Looks like a good idea.
But importantly:
A strong MC part
(in terms of playing strength of the MC part),
does not imply (by far!)
a stronger MCTS.
(except in 1P cases...)
103. We don't want to use expert knowledge.
We want automated solutions.
Developing a MC by Genetic Programming ?
Hoock et al
Cazenave et al
104. Part III: techniques for addressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
105. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
106. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
2) Then define:
Nested-MC0(state)=MC(state)
Nested-MC1(state)=decision maximizing
NestedMC0-value(next state)
...
Nested-MC.42(state)=decision maximizing
NestedMC.41-value(next state)
107. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
2) Then define:
Nested-MC0(state)=MC(state)
Nested-MC1(state)=decision maximizing
NestedMC0-value(next state)
...
Nested-MC.42(state)=decision maximizing
NestedMC.41-value(next state)
==> looks like a great idea
==> not good in Go
==> good on some less widely known testbeds
(“morpion solitaire”, some hard scheduling pbs)
108. Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Some algorithms which do not solve them
Part IV. Conclusion
109. Part IV: Conclusions
MCTS = algorithm from 2006
● Born in AI for games
●
Slightly related to A* and αβ-iterative-deepening
● Widely applicable.
● UCT = one variant (try it first, then test)
● RAVE & other statistics as a bias
● Parallelization + expertise.
●
Some clearly identified problems:
- abstract thinking (AI complete ?)
- divide & conquer
110. Part IV: Conclusions
Game of Go:
1- disappointingly,
most recent progress = human expertise
2- UCB is not that much involved in MCTS
(simple rules perform similarly)
==> publication bias
111. Part IV: Conclusions
Recent “generic” progress in MCTS:
1- application to GGP (general game playing):
the program learns the rules of the game
just before the competition, no last-minute
development (fully automatized)
==> good model for genericity
==> MCTS very good at this
112. Part IV: Conclusions
Recent “generic” progress in MCTS:
1- application to GGP (general game playing):
the program learns the rules of the game
just before the competition, no last-minute
development (fully automatized)
2- one-player games: great ideas which do not
work in 2P-games sometimes work in 1P
games (e.g. optimizing the MC in a
DPS sense)
113. Part IV: Conclusions
3. Applications in
video games
(restricted state
info)
4. PO games
(Minesweeper)
114. ML techniques for
understanding
from simulations
Abstract
thinking (looks
like theorem
proving)
Understanding this
“combination of local stuff”
is impossible for computers
MCTS = versatile, somehow model-free,
convenient, often great. What next ?
Can we compete with Alpha-Beta in
e.g. Chess ?