Combining algorithms might be more important than im-
proving, by a few percents, the performance of algorithms by making them more and more specic. As a consequence, many works have been devoted recently to portfolios of algorithms, i.e. the art of combining existing algorithms and selecting the relevant ones. Portfolios of algorithms
are classical in optimization and machine learning; this paper focuses on portfolios of policies. We distinguish:
{ Nash-Portfolio: cases in which we learn a portfolio-combination offline, based on a portfolio for each player (applicable for adversarial problems);
{ Bandit-Portfolio: cases in which we learn a portfolio-combination
online, against a xed opponent (applicable for adversarial problems with a xed opponent or for stochastic problems).
We apply this methodology for learning Go articial intelligences. The advantages are (i) diversity (the Nash-Portfolio is more variable than its components) (ii) adaptivity (the Bandit-Portfolio adapts to the oppo-
nent) (iii) simplicity (iv) increased performance. In particular, we will see that we can \bootstrap" the random seeds.
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
AI Portfolios Improve Random Seed Algorithms
1. Portfolios of Artificial Intelligences
+ playing with random seeds
1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
J.-B. Hoock, D. L. St-Pierre, O. Teytaud
2. Portfolio
● I have K algorithms for solving a given task :
– Mcts
– Alpha-Beta
– Parametric script
– Nested MC
– …
● I want to choose the best one
3. Two frameworks
● Offline
– I do some work before the competition
– I combine all my algorithms into 1
– Simple version :
● Compute some probability vector p
● For each game, use Algo(i) with probability p(i)
● Online
– For each game,
● Use Algo(i) with probability p(i)
● Update p when the game is over
4. 1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
5. Offline Nash portfolio
● K algorithms for black BAI(1),..., BAI(K)
● K' algorithms for white WAI(1),...,WAI(K')
● Def : Mij=proba( BAI(i) beats WAI(j) )
● Define (p,q) = Nash equilibrium of M
– p = best stochastic portfolio for Black (Nash sense)
– q = best stochastic portfolio for White (Nash sense)
● Portfolio :
– Black : Play BAI(i) with probability p(i)
– White : Play WAI(j) with probability q(j)
6. Other offline portfolios
● K algorithms for black BAI(1),..., BAI(K)
● K' algorithms for white WAI(1),...,WAI(K)
● Definitions :
– Uniform portfolio : p(i) = 1/K q(j)=1/K'
– Fixed seed : p(i)=1, q(j)=1 for some i,j
– Best arm : fixed seed with i best row / j best column
● Portfolio :
– Black : Play BAI(i) with probability p(i)
– White : Play WAI(j) with probability q(j)
7. 1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
8. Online portfolio (for Black)
● Just apply UCBT (or your favorite bandit)
● Before playing a game :
– p(i) = frequency of win for BAI(i)
– n(i) =number of times BAI(i) was used
– N= sum of the n(i)
– sc(i)= p(i) + Clog(N)/n(i)
+C' sqrt( p(i)(1-p(i)) log(N) /n(i) )
– choose i* maximizing sc(i*)
● Play with BAI(i*)
9. 1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
10. Nash
Computed
● exactly in polynomial time.
● with precision e in expected time
O( (K+K') log (K+K') 2 / e 2 )
The best portfolio in terms of
● Worst case winrate against the WAI(i)
● Worst case winrate against WAI(i) for i ~ some
probability distribution
11. UCBT for Black
● Nearly zero computational overhead
● Asymptotically optimal winning rate against a
stationary opponent, among the BAI(i)
● We did not try discounted Ucb
12. 1. What is a portfolio
2. Offline portfolio
3. Online portfolio
4. Mathematics (sorry)
5. Experiments
on 9x9 Go
13. First portfolio : random seeds
● Pick up a stochastic algorithm
● Choose K random seeds
● You get K algorithms
Hint : the random seed has a significant impact.
Yes, it's by rote learning (kind of opening book).
14. Performance of Nash portfolio
(learnt offline), in generalization
● Against
« new » seeds
● Vs uniform
==> this means we
outperform the
default version
(which is randomized seeds).
Portfolios are here
a distribution on random seeds.
We get an improved algorithm
(winning rate 66%) just
with that.
15. Performance of Nash portfolio
(learnt offline), in generalization
● Against
« new » seeds
● Vs uniform :
==> this means we
outperform the
default version
(which is randomized seeds)
Portfolios are here
a distribution on random seeds.
We get an improved algorithm
(winning rate 66%) just
with that.
X-axis = K = K'
16. Remarks
● Nash portfolio good
● « Best Arm » seed very good
● But we will see that « best arm » has
weaknesses ==> it can be « overfitted » i.e.
easily beaten by a « learning » opponent.
17. UCBT cruches fixedSeed and wins
against uniform
Dots decreasing
to 0.
Fixed seeds
(deterministic
algorithms)
are overfitted
after 64 games
X-axis =
log2 (nb of games)
(max. 512 games)
18. UCBT cruches fixedSeed and wins
against uniform
Dots decreasing
to 0.
Fixed seeds
(deterministic
algorithms)
are overfitted
after 64 games
X-axis =
log2 (nb of games)
(max. 512 games)
19. Other experiments : variants of
some algorithm
● Gnugo with options (32 variants)
● Nash-portfolio or UCBT portfolio : only a few
percents of improvements over a single ad hoc
variant.
==> less impressive than with random seeds
20. Conclusions
● Nice application for Nash-portfolio:
– Choose a stochastic algorithm
– Build a matrix M of games randomSeed vs
randomSeed
– Compute the Nash equilibrium
– You get a new probability distribution on random seeds
– It should be strong than the original algorithm.
● Nice application for UCBT-portfolio
– Play against it
– As long as you lose, it will keep the same line of play
21. Conclusions
● Further work
– Better Nash approximation
– Increase fun (should Ucbt explore more or less ?
discount ?)
– Bigger experiments (bigger games ? 19x19 ?)
● Comments ?
We forgot to cite your paper ?
We did not try on your favorite game ?
Our results are bullshit ? Please tell us:-)