Artificial Intelligence and Optimization with Parallelism

424 views
395 views

Published on

My habilitation thesis, 2011

Published in: Education, Technology, Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
424
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • Artificial Intelligence and Optimization with Parallelism

    1. 1. HABILITATION Artificial intelligence with Parallelism Acknowledgments: All the TAO team. People in Liège, Taiwan, Lri,Artelys, Mash, Iomca, .., Thanks a lot to the committee. Thanks + good recovery to Jonathan Shapiro. Thanks to Grid5000.Olivier Teytaud olivier.teytaud@inria.fr
    2. 2. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    3. 3. AI = using computers where they are weak / weaker than humans. (thanks Michèle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly
    4. 4. AI = using computers where they are weak / weaker than humans. (thanks Michèle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly
    5. 5. AI = using computers where they are weak / weaker than humans. (thanks Michèle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly
    6. 6. AI = using computers where they are weak / weaker than humans. (thanks Michèle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly
    7. 7. Many works are about numbers.Providing standard deviations, rates, etc.Other goal (more ambitious ?): switching from something which does not work to something which works.E.g. vision; a computer can distinguish:
    8. 8. But it cant distinguish so easily:
    9. 9. And its a disaster for categorizing- children,- women,- panda,- babies,- children- men,- bears,- trucks,- cars.
    10. 10. And its a disaster for categorizing children,women, panda, babies, children, men, bears, trucks, cars.
    11. 11. And its a disaster for categorizing children,women, panda, babies, children, men, bears, trucks, cars. 3 years old; she can do it.
    12. 12. ==> AI= focus on things which do not work and (hopefully) make them work.
    13. 13. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    14. 14. Evolutionary optimization is a part of A.I.Often considered as bad, because many EO tools are not that hard, mathematically speaking.Ive met people using - randomized mutations - cross-oversbut who did not call this evolutionary or genetic, because it would be bad.
    15. 15. Gives a lot freedom: - choose your operators (depending on the problem) - choose your population-size (depending on your computer/grid ) - choose  (carefully) e.g. min(dimension,  /4)==> Can work on strange domains
    16. 16. Voronoi representation of a shape: - a family of points (thanks Marc S.)
    17. 17. Voronoi representation: - a family of points
    18. 18. Voronoi representation: - a family of points - their labels
    19. 19. Voronoi representation: - a family of points - their labels==> cross-over makes sense==> you can optimize a shape
    20. 20. Voronoi representation: - a family of points - their labels==> cross-over makes sense==> you can optimize a shape
    21. 21. Voronoi representation: - a family of points - their labels==> cross-over makes sense==> you can optimize a shape Great substitute for averaging. “on the benefit of sex”
    22. 22. Cantilever optimization: Hamda et al, 2000
    23. 23. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    24. 24. Parallelism.Multi-core machinesClustersGridsSometimes parallelization completely changesthe picture.
    25. 25. Parallelism. Thank you G5KMulti-core machinesClustersGridsSometimes parallelization completely changesthe picture.Sometimes not.We want to know when.
    26. 26. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy cases Robustness,Sequential decision making slow rates. Fundamental facts Monte-Carlo Tree SearchConclusion
    27. 27. Derivative-free optimization of f No gradient ! Only depends on the xs and f(x)s
    28. 28. Derivative-free optimization of f Why derivative free optimization ?
    29. 29. Derivative-free optimization of f Why derivative free optimization ? Ok, its slower
    30. 30. Derivative-free optimization of f Why derivative free optimization ? Ok, its slower But sometimes you have no derivative
    31. 31. Derivative-free optimization of f Why derivative free optimization ? Ok, its slower But sometimes you have no derivative Its simpler (by far) ==> less bugs
    32. 32. Derivative-free optimization of f Why derivative free optimization ? Ok, its slower But sometimes you have no derivative Its simpler (by far) Its more robust (to noise, to strange functions...)
    33. 33. Derivative-free optimization of f Optimization algorithms ==> Newton optimization ? Why derivative free ==> Quasi-Newton (BFGS) Ok, its slower But sometimes you have no derivative ==> Gradient descent Its simpler (by far) ==> ...robust (to noise, to strange functions...) Its more
    34. 34. Derivative-free optimization of f Optimization algorithms Why derivative free optimization ? Ok, its slower Derivative-free optimization But sometimes you have no derivative (dont need gradients) Its simpler (by far) Its more robust (to noise, to strange functions...)
    35. 35. Derivative-free optimization of f Optimization algorithms Why derivative free optimization ? Derivative-free optimization Ok, its slower But sometimes you have no derivative Comparison-based optimization (coming soon), Its simpler (by far)comparisons, just needing Its more robust (to noise, to strange functions...) including evolutionary algorithms
    36. 36. Comparison-based optimization yi=f(xi) is comparison-based if parallel evolution 36
    37. 37. Population-based comparison-based algorithms X(1)=( x(1,1),x(1,2),...,x(1,) ) = Opt() X(2)=( x(2,1),x(2,2),...,x(2,) ) = Opt(x(1), signs of diff) … … ... x(n)=( x(n,1),x(n,2),...,x(n,) ) = Opt(x(n-1), signs of diff) parallel evolution 37
    38. 38. P-based c-based algorithms w/ internal state( X(1)=( x(1,1),x(1,2),...,x(1,) ),I(1) ) = Opt()( X(2)=( x(2,1),x(2,2),...,x(2,) ),I(2) ) = Opt(x(1),I(1), signs of diff) … … ...( x(n)=( x(n,1),x(n,2),...,x(n,) ),I(n) ) = Opt(x(n-1),I(n), signs of diff) parallel evolution 38
    39. 39. Comparison-based algorithms are robust Consider f: X --> R We look for x* such that x,f(x*) ≤ f(x) ==> what if we see g o f (g increasing) ? ==> x* is the same, but xn might change parallel evolution 39
    40. 40. Robustness of comparison-based algorithms: formal statement this does not depend on g for a comparison-based algorithm a comparison-based algorithm is optimal for parallel evolution 40
    41. 41. Complexity bounds (N = dimension) = nb of fitness evaluations for precision  with probability at least ½ for all f Exp ( - Convergence ratio ) = Convergence rate Convergence ratio ~ 1 / computational cost ==> more convenient than conv. rate for speed-ups parallel evolution 41
    42. 42. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 42
    43. 43. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 43
    44. 44. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 44
    45. 45. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 45
    46. 46. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 46
    47. 47. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 47
    48. 48. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers parallel evolution 48
    49. 49. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  ) parallel evolution 49
    50. 50. Complexity bounds on the convergence ratio FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked) parallel evolution 50
    51. 51. Complexity bounds on the convergence ratio This is why I love cross-over. FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked) parallel evolution 51
    52. 52. Complexity bounds on the convergence ratio Fournier, T., 2009; using VC-dim. FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked) parallel evolution 52
    53. 53. Complexity bounds on the convergence ratio Quadratic functions easier than sphere functions ? But not for translation invariant quadratic functions... FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked) parallel evolution 53
    54. 54. Complexity bounds on the convergence ratio Quadratic functions easier than sphere functions ? But not for translation invariant quadratic functions... FR: full ranking (selected points are ranked) results. Covers existing SB: selection-based (selected pointswith discrete domains. Compliant are not ranked) parallel evolution 54
    55. 55. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization 1) Mathematical proof that all Parallelization comparison-based algorithms Noisy cases can be parallelized (log speed-up)Sequential decision making Fundamental facts 2) Practical hint: simple tricks Monte-Carlo Tree Search for some well-known algorithmsConclusion
    56. 56. Speculative parallelization with branching factor 3 Consider the sequential algorithm. (iteration 1) parallel evolution 56
    57. 57. Speculative parallelization with branching factor 3 Consider the sequential algorithm. (iteration 2) parallel evolution 57
    58. 58. Speculative parallelization with branching factor 3 Consider the sequential algorithm. (iteration 3) parallel evolution 58
    59. 59. Speculative parallelization with branching factor 3 Parallel version for D=2. Population = union of all pops for 2 iterations. parallel evolution 59
    60. 60. Automatic parallelization Teytaud, T, PPSN 2010 parallel evolution 60
    61. 61. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization 1) Mathematical proof that all Parallelization comparison-based algorithms Noisy cases can be parallelized (log speed-up)Sequential decision making Fundamental facts 2) Practical hint: simple tricks Monte-Carlo Tree Search for some well-known algorithmsConclusion
    62. 62. Define:Necessary condition for log() speed-up: - E log( * ) ~ log()But for many algorithms,- E log( * ) = O(1) ==> asymptotically constant speed-up
    63. 63. These algos do not reach the log(lambda) speed-up. th (1+1)-ES with 1/5 rule Standard CSA Standard EMNA Standard SA. Teytaud, T, PPSN 2010
    64. 64. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points /= log( / 7)1 / d }
    65. 65. Ex 2: Log(lambda) correction for mutative self-adapt.  = min( /4,d) While ( I have time ) { Generate points (1,...,) as  x exp(- k.N) Generate points (x1,...,x) distributed as N(x,i) Select the  best points Update x (=mean), update (=log. mean) }
    66. 66. Log() corrections (SA, dim 3) ● In the discrete case (XPs): automatic parallelization surprisingly efficient. ● Simple trick in the continuous case - E log( *) should be linear in log() (this provides corrections which work for SA and CSA) parallel evolution 66
    67. 67. Log() corrections ● In the discrete case (XPs): automatic parallelization surprisingly efficient. ● Simple trick in the continuous case - E log( *) should be linear in log() (this provides corrections which work for SA and CSA) parallel evolution 67
    68. 68. SUMMARY of the EA part up to now: - evolutionary algorithms are robust (with a precise statement of this robustness) - evolutionary algorithms are somehow slow (precisely quantified...) - evolutionary algorithms are parallel (at least “until” the dimension for the conv. rate)
    69. 69. SUMMARY of the EA part up to now: - evolutionary algorithms are robust (with a precise statement of this robustness) - evolutionary algorithms are somehow slow (precisely quantified...) - evolutionary algorithms are parallel (at least “until” the dimension for the conv. rate) Now, noisy optimization
    70. 70. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    71. 71. Many works focus on fitness functions with “small” noise: f(x) = ||x||2 x (1+Gaussian )This is because the more realistic case f(x) = ||x||2 + Gaussian (variance >0 at optimum)is too hard for publishing nice curves.
    72. 72. Many works focus on fitness functions with “small” noise: f(x) = ||x||2 x (1+Gaussian )This is because the more realistic case f(x) = ||x||2 + Gaussianis too hard for publishing nice curves.==> see however Arnold Beyer 2006.==> a tool: races ( Heidrich-Meisner et al, Icml 2009) - reevaluating until statistically significant differences - … but we must (sometimes) limit the number of reevaluations
    73. 73. Another difficult case: Bernoulli functions. fitness(x) = B( f(x) ) f(0) not necessarily = 0.
    74. 74. Another difficult case: Bernoulli functions. EDA Based on fitness(x) = B( f(x) ) + races MaxUncertainty f(0) not necessarily = 0. (Coulom)
    75. 75. Another difficult case: Bernoulli functions. EDA Based on fitness(x) = B( f(x) ) + races MaxUncertainty f(0) not necessarily = 0. (Coulom) I like this case With p=2 with p=2
    76. 76. Another difficult case: Bernoulli functions. EDA Based on fitness(x) = B( f(x) ) + races MaxUncertainty f(0) not necessarily = 0. (Coulom) I like this case With p=2 with p=2
    77. 77. Another difficult case: Bernoulli functions. EDA Based on fitness(x) = B( f(x) ) + races MaxUncertainty f(0) not necessarily = 0. (Coulom) We prove good results here. I like this case With p=2 with p=2
    78. 78. Another difficult case: Bernoulli functions. EDA Based on fitness(x) = B( f(x) ) + races MaxUncertainty f(0) not necessarily = 0. (Coulom) We prove good results here. We prove good I like this case results here. With p=2 with p=2
    79. 79. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    80. 80. The game of Go is a part of AI.Computers are ridiculous in front of children. Easy situation. Termed “semeai”. Requires a little bit of abstraction.
    81. 81. The game of Go is a part of AI.Computers are ridiculous in front of children. 800 cores, 4.7 GHz, top level program. Plays a stupid move.
    82. 82. The game of Go is a part of AI.Computers are ridiculous in front of children. 8 years old; little training; finds the good move
    83. 83. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion
    84. 84. Monte-Carlo Tree Search1. Games (a bit of formalism)2. Decidability / complexity Games with simultaneous actions 84 Paris 1st of February
    85. 85. A game is a directed graph parallel evolution 85
    86. 86. A game is a directed graph with actions 1 2 3 parallel evolution 86
    87. 87. A game is a directed graph with actions and players 1 WhiteBlack 2 3 White 12 43 White Black Black Black Black parallel evolution 87
    88. 88. A game is a directed graph with actionsand players and observations Bob Bear Bee Bee 1 White Black 2 3 White 12 43 White Black Black Black Black parallel evolution 88
    89. 89. A game is a directed graph with actionsand players and observations and rewards Bob Bear Bee Bee 1 White Black 2 +1 3 0 White 12 Rewards 43 White Black on leafs Black only! Black Black parallel evolution 89
    90. 90. A game is a directed graph +actions+players +observations +rewards +loops Bob Bear Bee Bee 1 White Black 2 +1 3 0 White 12 43 White Black Black Black Black parallel evolution 90
    91. 91. Monte-Carlo Tree Search1. Games (a bit of formalism)2. Decidability / complexity Games with simultaneous actions 91 Paris 1st of February
    92. 92. Complexity (2P, no random) Unbounded Exponential Polynomial horizon horizon horizonFull Observability EXP EXP PSPACENo obs EXPSPACE NEXP(X=100%) (Hasslum et al, 2000)Partially 2EXP EXPSPACEObservable (Rintanen, 97)(X=100%)Simult. Actions ? EXPSPACE ? <<<= EXP <<<= EXPNo obs / PO undecidable
    93. 93. Complexity question ? (UD) Instance = position. Question = Is there a strategy which wins whatever are the decisions of the opponent ? = natural question if full observability. Answering this question then allows perfect play.
    94. 94. Hummm ? Do you know a PO game in which you can ensure a win with probability 1 ?
    95. 95. Complexity question for matrix game ? 100000 010000 Good for column-player ! 001000 ==> but no sure win. 000100 ==> the “UD” question is not 000010 relevant here! 000001
    96. 96. Complexity question for Joint work with phantom-games ? F. Teytaud This is phantom-go. Good for black: wins with proba 1-1/(8!) Here, theres no move which ensures a win. But some moves are much better than others!
    97. 97. Another formalization c ==> much more satisfactory
    98. 98. Madani et al. c 1 player + random = undecidable.
    99. 99. Madani et al.1 player + random = undecidable.We extend to two players with no random.Problem: rewrite random nodes, thanks toadditional player.
    100. 100. A random node to be rewritten
    101. 101. A random node to be rewritten
    102. 102. A random node to be rewrittenRewritten as follows:Player 1 chooses a in [[0,N-1]]Player 2 chooses b in [[0,N-1]]c=(a+b) modulo NGo to tcEach player can force the game to be equivalent tothe initial one (by playing uniformly)==> the proba of winning for player 1 (in case of perfect play) is the same as for the initial game==> undecidability!
    103. 103. Important remarkExistence of a strategy for winning withproba > 0.5==> also undecidable for the restriction to games in which the proba is >0.6 or <0.4==> not just a subtle precision trouble.
    104. 104. Monte-Carlo Tree Search MCTS principle But with EXP3 in nodes for hidden information.
    105. 105. UCT (Upper Confidence Trees)Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)
    106. 106. UCT
    107. 107. UCT
    108. 108. UCT
    109. 109. UCT
    110. 110. UCT Kocsis & Szepesvari (06)
    111. 111. Exploitation ...
    112. 112. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
    113. 113. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
    114. 114. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
    115. 115. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
    116. 116. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 ) Binary win/loss games: no explo! (Berthier, D., T., 2010)
    117. 117. Games vs pros in the game of GoFirst win in 9x9First win over 5 games in 9x9 blind GoFirst win with H2.5 in 13x13 GoFirst win with H6 in 19x19 GoFirst win with H7 in 19x19 Go vs top pro
    118. 118. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 ) Simultaneous actions: replace it with EXP3 / INF
    119. 119. MCTS for simultaneous actions Player 1 plays Player 2 plays Both players play... Player 1 plays Player 2 plays
    120. 120. MCTS for simultaneous actions Player 1 plays = maxUCB node Player 2 plays Both players play =minUCB node =EXP3 node Player 1 plays... Player 2 plays =maxUCB node =minUCB node
    121. 121. MCTS for hidden informationPlayer 1 Observation set 1 Observation set 2 EXP3 node EXP3 node Observation set 3 EXP3 node Player 2 Observation set 2 Observation set 1 EXP3 node EXP3 node Observation set 3 EXP3 node
    122. 122. MCTS for hidden informationPlayer 1 Observation set 1 Observation set 2 EXP3 node EXP3 node Observation set 3 EXP3 node Thanks Martin(incrementally + application to phantom-tic-tac-toe: see D. Auger 2010) Player 2 Observation set 2 Observation set 1 EXP3 node EXP3 node Observation set 3 EXP3 node
    123. 123. EXP3 in one slideGrigoriadis et al, Auer et al, Audibert & Bubeck Colt 2009
    124. 124. Monte-Carlo Tree SearchAppli to Urban Rivals ==>(simultaneous actions) Games with simultaneous actions 124 Paris 1st of February
    125. 125. Lets have fun with Urban Rivals (4 cards) Each player has - four cards (each one can be used once) - 12 pilz (each one can be used once) - 12 life points Each card has: - one attack level - one damage - special effects (forget that...) Four turns: P1 attacks P2, P2 attacks P1, P1 attacks P2, P2 attacks P1. parallel evolution 125
    126. 126. Lets have fun with Urban RivalsFirst, attacker plays:- chooses a card- chooses ( PRIVATELY ) a number of pilz Attack level = attack(card) x (1+nb of pilz)Then, defender plays: - chooses a card - chooses a number of pilz Defense level = attack(card) x (1+nb of pilz)Result: If attack > defense Defender looses Power(attackers card) Else Attacker looses Power(defenders card) parallel evolution 126
    127. 127. Lets have fun with Urban Rivals ==> The MCTS-based AI is now at the best human level. Experimental (only) remarks on EXP3: - discard strategies with small number of sims = better approx of the Nash - also an improvement by taking into account the other bandit - virtual simulations (inspired by Kummer) parallel evolution 127
    128. 128. When is MCTS relevant ? Robust in front of:High dimension;Non-convexity of Bellman values;Complex modelsDelayed rewardSimultaneous actions, partial informationMore difficult forHigh values of H;Model-freeHighly unobservable cases (Monte-Carlo, but not Monte-Carlo TreeSearch, see Cazenave et al.)Lack of reasonable baseline for the MC
    129. 129. When is MCTS relevant ? T., Dagstuhl 2010, D. Auger, Robust in front of: EvoStar 2011. EvoStar 2011;High dimension; UnpublishedNon-convexity of Bellman values;Complex models results onDelayed reward Some endgames undecidabilitySimultaneous actionsMore difficult for resultsHigh values of H;Model-freeHighly unobservable cases (Monte-Carlo, but not Monte-Carlo TreeSearch, see Cazenave et al.)Lack of reasonable baseline for the MC
    130. 130. ConclusionEvo. Opt: robustness, tight bounds, simple algorithmic modifs for better speed-up (SA, 1/5th, (CSA))MCTS just great (but requires a model); UCB not necessary; extension to hidden info (rmk: undecidability); PO endgames; but no abstraction power.Noisy optimization: Consider high noise. Use QR and Learning (in all EA in fact).Not mentioned here: multimodal, multiobj, GP, bandits.
    131. 131. Future ? - Solving semeais ? Would involve great AI progress I think... - Noisy optimization; there are still things to be done. ==> Promoting high noise fitness functions even if it is less publication-efficient. - ``Inheritance of belief state in partially observable games. Big progress to be done. Crucial for applications. - Sparse bandits / mixed stochastic/adversarial cases.Thanks for your attention. Thanks to all collaborators for all Ive learnt with them.
    132. 132. Appendix 1:MCTS with hidden information
    133. 133. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    134. 134. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    135. 135. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    136. 136. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    137. 137. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    138. 138. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    139. 139. MCTS with hidden information:incremental versionWhile (there is time for thinking){ s=initial state os(1)=() os(2)=() while (s not terminal) { p=player(s) b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}
    140. 140. MCTS with hidden information:incremental versionWhile (there is time for thinking){ Possibly s=initial state os(1)=() os(2)=() refine while (s not terminal) the family { p=player(s) of bandits. b=Exp3Bandit(os(p)) d=b.makeDecision (s,o)=transition(s,d) } send reward to all bandits in the simulation}

    ×