The document discusses bandit problems with strategic choices and small budgets. It defines bandit problems, strategic bandit problems, and compares the two. It presents algorithms for exploring options and making recommendations in both one-player and two-player settings. Experimental results on a Go positioning problem and an online card game show that TEXP3 outperforms other algorithms in two-player settings. The document concludes with discussions on extensions to structured bandits and using strategic bandits to model investment choices.
A simple tutorial on Monte-Carlo Tree Search
Contains a description of dynamic programming and alpha-beta search, then MCTS. Special cases for simultaneous actions are discussed.
I should add comments so that it can be used without preliminary knowledge of MCTS, if there is at least one request for doing so I'll do it.
@article{gelly:hal-00695370,
hal_id = {hal-00695370},
url = {http://hal.inria.fr/hal-00695370},
title = {{The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions}},
author = {Gelly, Sylvain and Kocsis, Levente and Schoenauer, Marc and Sebag, Mich{\`e}le and Silver, David and Szepesvari, Csaba and Teytaud, Olivier},
abstract = {{The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, com- puter Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo meth- ods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , LPDS , Microsoft Research - Inria Joint Centre - MSR - INRIA , University of Alberta, Canada , Department of Computing Science},
publisher = {ACM},
pages = {106-113},
journal = {Communication of the ACM},
volume = {55},
number = {3 },
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00695370/PDF/CACM-MCTS.pdf},
}
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
A simple tutorial on Monte-Carlo Tree Search
Contains a description of dynamic programming and alpha-beta search, then MCTS. Special cases for simultaneous actions are discussed.
I should add comments so that it can be used without preliminary knowledge of MCTS, if there is at least one request for doing so I'll do it.
@article{gelly:hal-00695370,
hal_id = {hal-00695370},
url = {http://hal.inria.fr/hal-00695370},
title = {{The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions}},
author = {Gelly, Sylvain and Kocsis, Levente and Schoenauer, Marc and Sebag, Mich{\`e}le and Silver, David and Szepesvari, Csaba and Teytaud, Olivier},
abstract = {{The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, com- puter Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo meth- ods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , LPDS , Microsoft Research - Inria Joint Centre - MSR - INRIA , University of Alberta, Canada , Department of Computing Science},
publisher = {ACM},
pages = {106-113},
journal = {Communication of the ACM},
volume = {55},
number = {3 },
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00695370/PDF/CACM-MCTS.pdf},
}
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Students should be able to:
Use simple game theory to illustrate the interdependence that exists in oligopolistic markets
Understanding the prisoners’ dilemma and a simple two firm/two outcome model. Students should analyse the advantages/disadvantages of being a first mover
Students will not be expected to have an understanding of the Nash Equilibrium
This presentation is an attempt to introduce Game Theory in one session. It's suitable for undergraduates. In practice, it's best used as a taster since only a portion of the material can be covered in an hour - topics can be chosen according to the interests of the class.
The main reference source used was 'Games, Theory and Applications' by L.C.Thomas. Further notes available at: http://bit.ly/nW6ULD
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Students should be able to:
Use simple game theory to illustrate the interdependence that exists in oligopolistic markets
Understanding the prisoners’ dilemma and a simple two firm/two outcome model. Students should analyse the advantages/disadvantages of being a first mover
Students will not be expected to have an understanding of the Nash Equilibrium
This presentation is an attempt to introduce Game Theory in one session. It's suitable for undergraduates. In practice, it's best used as a taster since only a portion of the material can be covered in an hour - topics can be chosen according to the interests of the class.
The main reference source used was 'Games, Theory and Applications' by L.C.Thomas. Further notes available at: http://bit.ly/nW6ULD
Applied Data Science for monetization: pitfalls, common misconceptions, and n...DevGAMM Conference
This talk guides us through modern twists on classic user-oriented data science tasks, such as churn prediction, clusterization, calculating user metrics, and others. We will discuss unusual angles for solving these tasks; how and why they can be used to improve player experience and monetization; the intuition behind these methods, and insights into inner machinery; and why conventional methods work poorly. Finally, I'll show you how you can apply this knowledge to improve your users' playing experience, and streamline analytics; and we'll talk general situation of applied data science and analytics in the industry.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Choosing between several options in uncertain environments
1. METAGAMING:
Bandits with simple regret and small
budget
Chen-Wei Chou, Ping-Chiang Chou,
Chang-Shing Lee, David Lupien St-Pierre,
Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu
and Shi-Jim Yen
2. Outline:
- what is a bandit problem ?
- what is a strategic bandit problem ?
- is a strategic bandit different from a bandit ?
- algorithms
- results
3. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
4. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we collect
information
5. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we use
information for
the final choice
6. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we
explore
7. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we take no risk
8. What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step (exploration):
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end (recommendation):
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
9. Which kind of bandit ?
- in the bandit literature, options are
also termed “arms”
- here the criterion is the expected reward
of the option chosen at the end
(sometimes it is the sum
of the rewards during exploration)
- we presented here stochastic bandits
(a probability distribution
per option) ==> next slide is different
10. And adversarial bandit ?
A finite number of time steps
A (finite) number of options for player 1,
and a finite number of options for player 2.
An unknown probability distribution for each pair of options
At each time step:
- you choose one option for P1 and one option for P2
- you get a reward, distributed according to the
corresponding proba distribution
At the end:
- you choose one **probabilistic** option for P1
(you can not change anymore...)
- your reward is the expected reward associated to this option,
for the worst choice by P2
11. What is meta-gaming ?
What is “strategic choice” ?
Strategic choices:
- decisions once and for all, at a high level
- ≠ from tactical level
Meta-gaming: choice at a strategic level, in games:
- choosings cards, in card games
- choosing handicap positioning, in Go
==> once and for all, at the beginning of the game
12. Example of stochastic bandit
(i.e. 1P strategic choice)
Game of Go handicap bandit problem, at each time step:
- you choose one handicap positioning
- then you simulate one game from this position
==> only one player has a strategic choice
==> stochastic bandit
13. Example of adversarial bandit
(i.e. 2P strategic choice)
Urban Rivals bandit problem, at each time step:
- you choose
- one set of cards for you (P1)
- one set of cards for P2
- then you simulate one Urban Rivals game from this position
PLAYER 1:
PLAYER 2:
==> two players have a strategic choice
==> adversarial bandit
14. Is a strategic bandit problem
different from
a classical bandit problem ?
No difference in nature
Just a much
smaller budget
15. Algorithms
Reminder:
- two algorithms needed:
- one for choosing during N exploration steps
- one for choosing during 1 recommendation step
- two settings
- one-player case
- two-player case
16. Algorithms for exploration
Uniform: test all options uniformly
Bernstein races:
- uniformly among non discarded options,
- discard options with statistical tests
Successive reject:
- uniformly among non discarded options,
- discard periodically the worst option
UCB: choose option with best average result + bonus
for options weakly sampled,
Adaptive-UCB-E: a variant of UCB aimed at removing
hyper-parameters
EXP3: empirically best option + random perturbation
17. Algorithms for recommendation
Empirically Best Arm: choose empirically best option
Most Played Arm: choose most simulated option
Successive reject:: the only non discarded option
UCB: choose option with best average result + bonus
for options weakly sampled.
LCB: choose option with best average result + malus for
options weakly sampled.
Empirical distribution of play: an option has its
frequency (during exploration) as probability (for
recommendation)
TEXP3: idem, but discard low probability options
23. Do you know killall-Go ?
Black has stones in advance (e.g. 8 in 13x13).
If white makes life, white wins.
If black kills everything, black wins.
Black choose stones
positioning
(strategic decisions).
24. Left: human is Black and chooses E3 C4.
Right: computer is Black and chooses D3 D5.
White won both.
Human said that the computer choice D3 D5 is good.
25. Killall Go, H8 (left) H9 (right)
Left: Human Pro Player (5P) as black has 8 handicap stones.
White (computer) makes life and wins.
Right: Human Pro Player (5P) as black has 9 handicap stones
and kills everything and wins.
26. CONCLUSIONS
1 player case:
UCB for exploration,
LCB or MPA for recommendation
2 player case:
TEXP3 performs best.
Killall-Go
Win against pro with H2 in 7x7 Killall-Go as white.
Loss against pro with H2 in 7x7 Killall-Go as black.
13x13: Computer won as white with H8, lost with H9.
13x13: Computer lost as black with H8 and with H9.
Further work:
Structured bandit: some options are close to each other.
Batoo: Go with strategic choice for both players; nice test case.
Industry: choosing investments for power grid simulations – in progress.