Best results so far for minesweeper on small board sizes can be found in http://hal.inria.fr/hal-00712417.
2 bibtex references below:
@article{10.1109/TAAI.2011.55,
author = {Adrien Couetoux and Mario Milone and Olivier Teytaud},
title = {Consistent Belief State Estimation, with Application to Mines},
journal ={Technologies and Applications of Artificial Intelligence, International Conference on},
volume = {0},
isbn = {978-0-7695-4601-8},
year = {2011},
pages = {280-285},
doi = {http://doi.ieeecomputersociety.org/10.1109/TAAI.2011.55},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}
And the UCT performances on MineSweeper on small board:
@inproceedings{sebag:hal-00712417,
hal_id = {hal-00712417},
url = {http://hal.inria.fr/hal-00712417},
title = {{Combining Myopic Optimization and Tree Search: Application to MineSweeper}},
author = {Sebag, Mich{\`e}le and Teytaud, Olivier},
abstract = {{Abstract. Many reactive planning tasks are tackled by optimization combined with shrinking horizon at each time step: the problem is sim- plified to a non-reactive (myopic) optimization problem, based on the available information at the current time step and an estimate of future behavior, then it is solved; and the simplified problem is updated at each time step thanks to new information. This is in particular suitable when fast off-the-shelf components are available for the simplified problem - optimality stricto sensu is not possible, but good results are obtained at a reasonnable computational cost for highly untractable problems. As machines get more powerful, it makes sense however to go beyond the inherent limitations of this approach. Yet, a brute-force solving of the complete problem is often impossible; we here propose a methodology for embedding a solver inside a consistent reactive planning solver. Our methodology consists in embedding the solver in an Upper- Confidence-Tree algorithm, both in the nodes and as a Monte-Carlo simulator. We show the mathematical consistency of the approach, and then we apply it to a classical success of the myopic approach: the MineSweeper game.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{LION6, Learning and Intelligent Optimization}},
pages = {in press (14 pages, long paper)},
address = {Paris, France},
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00712417/PDF/mines2.pdf},
}
2. MINESWEEPER
A. Couëtoux, O. Teytaud
TAO, Inria, Lri, U-Psud, Umr Cnrs 8623 + OASE, NUTN
Sometimes we work
on (visibly)
serious stuff.
3. MINESWEEPER
A. Couëtoux, O. Teytaud
TAO, Inria, Lri, U-Psud, Umr Cnrs 8623 + OASE, NUTN
Sometimes we work
on (visibly)
serious stuff.
But I think the best
challenge for proving that
we have good algorithms
is games
4. And a great challenge is MineSweeper!
Yes I'm serious!
21. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom:
22. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
23. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
==> so all moves
equivalent ?
24. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
==> so all moves
equivalent ?
==> NOOOOO!!!
25. MineSweeper approaches
- exact MDP: very expensive. 4x4 solved.
- CSP: the main approach.
- (unknown) state:
x(i) = 1 if there is a mine at location i
- each visible location is a constraint:
If location 15 is 4, then
x(04)+x(05)+x(06)
+x(14)+ x(16)
+x(24)+x(25)+x(26) = 4.
- find all solutions X1, X2, X3,...,XN
- P(mine in j) = sumi Xij / N
- play j such that P(mine in j) minimal
- randomly break tie.
MDP= Markov Decision Process
CSP = Constraint Satisfaction Problem
26. CSP
- is very fast
- but it's not optimal
- because of
Here CSP plays randomly!
Also for the initial move: don't play
randomly the first move! (sometimes opening book)
27. Why not UCT ?
- looks like a stupid idea at first view
- can not compete with CSP in terms of speed
- But at least UCT is
consistent: if given
sufficient
time, it will play
optimally.
40. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what
are the possible next states ?
41. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
42. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
43. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
44. What do I need for implementing UCT ?
A complete generative model.
Given a state and an action,
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
45. Can you please
forgive me for that ?
What do I need for implementing UCT ?
Given a state andI've been lazy, I have just
A complete generative model.
an action,
implemented the rejection algorithm.
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
46. Rejection algorithm:
1- randomly draw the mines
What do I need for implementing UCT ?
Given 2- if and an action, return the new observation
a state it's ok,
A complete generative model.
3- otherwise, go back to 1.
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
47. (being lazy is good:
I could write a second paper with
What do I need for implementing UCT ?
a better algorithm :-) )
A complete generative model.
Given a state and an action,
(using CSP for this!)
I must be able to simulate possible transitions.
State S, Action a:
(S,a) ==> S'
Example: given the state below, and the action “top left”, what are the possible next
states ?
48. An example showing that the initial
move matters (and our algorithm finds it!)..
3x3, 7 mines:
the optimal move
is anything but the center.
Optimal winning rate: 25%.
Optimal winning rate if
random uniform
initial move: 17/72.
(yes we get 1/72
improvement!)
49. 15 mines on 5x5 board with
GnoMine rule
(i.e. initial move is 0)
Optimal success rate = 100%!!!!!
Play the center, and you win (well, you have to work...)
50. UCT vs CSP + opening book (play corners)
in the Windows mode
51. Probability
of a mine ?
- Top: 33%
- Middle: 33%
- Bottom: 33%
Top or bottom:
66% of win!
Middle: 33%!
52. CONCLUSIONS
- MineSweeper is not dead!
==> still a challenge
- When you have a myopic solver
(i.e. which neglects long term
effect, as often in industry!) ==>
combine with UCT
- More to come, big boards are far
from optimal
53. Thanks for your attention!
9 Mines.
What is the
optimal move ?