AI Portfolios Improve Random Seed Algorithms

Portfolios of Artificial Intelligences
+ playing with random seeds
1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
J.-B. Hoock, D. L. St-Pierre, O. Teytaud

Portfolio
● I have K algorithms for solving a given task :
– Mcts
– Alpha-Beta
– Parametric script
– Nested MC
– …
● I want to choose the best one

Two frameworks
● Offline
– I do some work before the competition
– I combine all my algorithms into 1
– Simple version :
● Compute some probability vector p
● For each game, use Algo(i) with probability p(i)
● Online
– For each game,
● Use Algo(i) with probability p(i)
● Update p when the game is over

3. Online portfolio
5. Experiments

Offline Nash portfolio
● K algorithms for black BAI(1),..., BAI(K)
● K' algorithms for white WAI(1),...,WAI(K')
● Def : Mij=proba( BAI(i) beats WAI(j) )
● Define (p,q) = Nash equilibrium of M
– p = best stochastic portfolio for Black (Nash sense)
– q = best stochastic portfolio for White (Nash sense)
● Portfolio :
– Black : Play BAI(i) with probability p(i)
– White : Play WAI(j) with probability q(j)

Other offline portfolios
● K algorithms for black BAI(1),..., BAI(K)
● K' algorithms for white WAI(1),...,WAI(K)
● Definitions :
– Uniform portfolio : p(i) = 1/K q(j)=1/K'
– Fixed seed : p(i)=1, q(j)=1 for some i,j
– Best arm : fixed seed with i best row / j best column
● Portfolio :
– Black : Play BAI(i) with probability p(i)
– White : Play WAI(j) with probability q(j)

Online portfolio (for Black)
● Just apply UCBT (or your favorite bandit)
● Before playing a game :
– p(i) = frequency of win for BAI(i)
– n(i) =number of times BAI(i) was used
– N= sum of the n(i)
– sc(i)= p(i) + Clog(N)/n(i)
+C' sqrt( p(i)(1-p(i)) log(N) /n(i) )
– choose i* maximizing sc(i*)
● Play with BAI(i*)

Nash
Computed
● exactly in polynomial time.
● with precision e in expected time
O( (K+K') log (K+K') 2 / e 2 )
The best portfolio in terms of
● Worst case winrate against the WAI(i)
● Worst case winrate against WAI(i) for i ~ some
probability distribution

UCBT for Black
● Nearly zero computational overhead
● Asymptotically optimal winning rate against a
stationary opponent, among the BAI(i)
● We did not try discounted Ucb

3. Online portfolio
5. Experiments
on 9x9 Go

First portfolio : random seeds
● Pick up a stochastic algorithm
● Choose K random seeds
● You get K algorithms
Hint : the random seed has a significant impact.
Yes, it's by rote learning (kind of opening book).

Performance of Nash portfolio
(learnt offline), in generalization
● Against
« new » seeds
● Vs uniform
==> this means we
outperform the
default version
(which is randomized seeds).
Portfolios are here
a distribution on random seeds.
We get an improved algorithm
(winning rate 66%) just
with that.

Performance of Nash portfolio
(learnt offline), in generalization
● Against
« new » seeds
● Vs uniform :
==> this means we
outperform the
default version
(which is randomized seeds)
Portfolios are here
a distribution on random seeds.
We get an improved algorithm
(winning rate 66%) just
with that.
X-axis = K = K'

Remarks
● Nash portfolio good
● « Best Arm » seed very good
● But we will see that « best arm » has
weaknesses ==> it can be « overfitted » i.e.
easily beaten by a « learning » opponent.

UCBT cruches fixedSeed and wins
against uniform
Dots decreasing
to 0.
Fixed seeds
(deterministic
algorithms)
are overfitted
after 64 games
X-axis =
log2 (nb of games)
(max. 512 games)

Other experiments : variants of
some algorithm
● Gnugo with options (32 variants)
● Nash-portfolio or UCBT portfolio : only a few
percents of improvements over a single ad hoc
variant.
==> less impressive than with random seeds

Conclusions
● Nice application for Nash-portfolio:
– Choose a stochastic algorithm
– Build a matrix M of games randomSeed vs
randomSeed
– Compute the Nash equilibrium
– You get a new probability distribution on random seeds
– It should be strong than the original algorithm.
● Nice application for UCBT-portfolio
– Play against it
– As long as you lose, it will keep the same line of play

Conclusions
● Further work
– Better Nash approximation
– Increase fun (should Ucbt explore more or less ?
discount ?)
– Bigger experiments (bigger games ? 19x19 ?)
● Comments ?
We forgot to cite your paper ?
We did not try on your favorite game ?
Our results are bullshit ? Please tell us:-)

AI Portfolios Improve Random Seed Algorithms

AI Portfolios Improve Random Seed Algorithms

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to AI Portfolios Improve Random Seed Algorithms

Similar to AI Portfolios Improve Random Seed Algorithms (20)

Recently uploaded

Recently uploaded (20)

AI Portfolios Improve Random Seed Algorithms