Computed Prediction: So far, so good. What now?


Published on

Pier Luca Lanzi talks at NIGEL 2006 about computed predictions

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Computed Prediction: So far, so good. What now?

  1. 1. Computed Prediction So far, so good. What now? Pier Luca Lanzi Politecnico di Milano, Italy Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana Champaign, USA
  2. 2. RL
  3. 3. What is the problem? Agent stt+1 at rt+1 How much future reward when action at is performed in state st? Environment What is the expected payoff for st and at? Compute a value function Q(st,at) mapping GOAL: maximize the amount of state-action pairs into expected future payoffs reward received in the long run
  4. 4. Example: The Mountain Car rt = 0 when goal is reached, -1 otherwise. GOAL Value Function Q(st,at) st = position, ac c. velocity no , ac c. ht, eft rig c. l ac a= t Task: drive an underpowered car up a steep mountain road
  5. 5. What are the issues? Learning the unknown payoff function while also trying to approximate it Approximator works on intermediate estimates but it also tries to provide information for the learning  Exact representation infeasible  Approximation mandatory not guaranteed Convergence is  The function is unknown, it is learnt online from experience
  6. 6. Classifiers
  7. 7. Learning Classifier Systems Solve reinforcement learning problems Represent the payoff function Q(st, at) as a population of rules, the classifiers. Classifiers are evolved while Q(st, at) is learnt online
  8. 8. What is a classifier? IF condition C is true for input s Generalization depends on a is p well THEN the payoff of action how conditions can partition the problem space Accurate approximations What is the best representation for the payoff payoff problem? surface for A p General conditions Several representations have been covering large portions Condition of the developed to improve generalization problem space C(s)=l≤s≤u s l u
  9. 9. What is computed prediction? Replace the prediction p by a parametrized function p(x,w) Which type of approximation? payoff payoff p(x,w)=w0+xw1 landscape of A Which Representation? Condition C(s)=l≤s≤u x l u IF condition C is true for input s
  10. 10. Computed Prediction: Linear approximation  Each classifier has a vector of parameters w  Classifier prediction is computed as,  Classifier weights are updated using Widrow-Hoff update,
  11. 11. Summary
  12. 12. What are the differences? Gradient Convex Hulls Descent GOAL: Learn the Linear Boolean APPROXIMATOR payoff function Prediction Representatio LCS approach asks: Typical Boolean Typical RL approach: Radial Basis REPRESENTATION n Representation What is the best representation SigmoidPredict best approximator? What is the Neural ion intervals for messy problem? PredictionHulls the 0/1/# NNs ellipsoid Symbol ComputedBull Real Intervals s (O’hara & Neural 2004) Prediction Tile Coding Prediction
  13. 13. To represent or to approximate? Experiment  Powerful representations allow the solution of difficult problems with basic approximators Consider a very powerful approximator  Powerful approximators may make the that we know it can solve a certain RL problem choice of the representation less critical Use it to compute classifier prediction in an LCS and apply the LCS to solve the same problem Does genetic search still provide an advantage?
  14. 14. Computed prediction with Tile Coding  Powerful approximator developed in the reinforcement learning community  Tile coding can solve the mountain car problem given an adequate parameter setting What should we expect?  Classifier prediction is computed using tile coding  Each tile coding has a different parameter settings  When using tile coding to compute classifier prediction, one classifier can solve the whole problem
  15. 15. The performance? Computed prediction can perform as well as the approximator with the most adequate configuration The evolution of a population of classifiers provides advantages over one approximator Even if the same approximator alone might solve the whole problem
  16. 16. How do parameters evolve?
  17. 17. What now?
  18. 18. What now? REPRESENTATION Which approximator? Which Let evolution decide! representation? APPROXIMATOR Population of classifiers using different approximators to compute prediction Proble The genetic algorithm m selects the best Which approximators for each problem subspace approximator?
  19. 19. Evolving the best approximator
  20. 20. What next? REPRESENTATION Which approximator? Which Let evolution decide! representation? APPROXIMATOR Population of classifiers using different approximators to compute prediction Proble m Even if the same approximator alone Which might solve the whole problem approximator?
  21. 21. Evolving Heterogeneous Approximators Heterogeneous Approximators Most Powerful Approximator
  22. 22. What next? Probably done for Boolean  Allow different representations Conditions in the same populations  Let evolution evolve the most adequate representation for each problem subspace  Then, allow different representations and different approximators evolve all together
  23. 23. Acknowledgements  Daniele Loiacono  Matteo Zanini  All the current and former members of IlliGAL
  24. 24. Thank you! Any question?