Computed Prediction: So far, so good. What now?

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Computed Prediction: So far, so good. What now? - Presentation Transcript

    1. Computed Prediction So far, so good. What now? Pier Luca Lanzi Politecnico di Milano, Italy Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana Champaign, USA
    2. RL
    3. What is the problem? Agent stt+1 at rt+1 How much future reward when action at is performed in state st? Environment What is the expected payoff for st and at? Compute a value function Q(st,at) mapping GOAL: maximize the amount of state-action pairs into expected future payoffs reward received in the long run
    4. Example: The Mountain Car rt = 0 when goal is reached, -1 otherwise. GOAL Value Function Q(st,at) st = position, ac c. velocity no , ac c. ht, eft rig c. l ac a= t Task: drive an underpowered car up a steep mountain road
    5. What are the issues? Learning the unknown payoff function while also trying to approximate it Approximator works on intermediate estimates but it also tries to provide information for the learning  Exact representation infeasible  Approximation mandatory not guaranteed Convergence is  The function is unknown, it is learnt online from experience
    6. Classifiers
    7. Learning Classifier Systems Solve reinforcement learning problems Represent the payoff function Q(st, at) as a population of rules, the classifiers. Classifiers are evolved while Q(st, at) is learnt online
    8. What is a classifier? IF condition C is true for input s Generalization depends on a is p well THEN the payoff of action how conditions can partition the problem space Accurate approximations What is the best representation for the payoff payoff problem? surface for A p General conditions Several representations have been covering large portions Condition of the developed to improve generalization problem space C(s)=l≤s≤u s l u
    9. What is computed prediction? Replace the prediction p by a parametrized function p(x,w) Which type of approximation? payoff payoff p(x,w)=w0+xw1 landscape of A Which Representation? Condition C(s)=l≤s≤u x l u IF condition C is true for input s
    10. Computed Prediction: Linear approximation  Each classifier has a vector of parameters w  Classifier prediction is computed as,  Classifier weights are updated using Widrow-Hoff update,
    11. Summary
    12. What are the differences? Gradient Convex Hulls Descent GOAL: Learn the Linear Boolean APPROXIMATOR payoff function Prediction Representatio LCS approach asks: Typical Boolean Typical RL approach: Radial Basis REPRESENTATION n Representation What is the best representation SigmoidPredict best approximator? What is the Neural ion intervals for messy problem? PredictionHulls the 0/1/# NNs ellipsoid Symbol ComputedBull Real Intervals s (O’hara & Neural 2004) Prediction Tile Coding Prediction
    13. To represent or to approximate? Experiment  Powerful representations allow the solution of difficult problems with basic approximators Consider a very powerful approximator  Powerful approximators may make the that we know it can solve a certain RL problem choice of the representation less critical Use it to compute classifier prediction in an LCS and apply the LCS to solve the same problem Does genetic search still provide an advantage?
    14. Computed prediction with Tile Coding  Powerful approximator developed in the reinforcement learning community  Tile coding can solve the mountain car problem given an adequate parameter setting What should we expect?  Classifier prediction is computed using tile coding  Each tile coding has a different parameter settings  When using tile coding to compute classifier prediction, one classifier can solve the whole problem
    15. The performance? Computed prediction can perform as well as the approximator with the most adequate configuration The evolution of a population of classifiers provides advantages over one approximator Even if the same approximator alone might solve the whole problem
    16. How do parameters evolve?
    17. What now?
    18. What now? REPRESENTATION Which approximator? Which Let evolution decide! representation? APPROXIMATOR Population of classifiers using different approximators to compute prediction Proble The genetic algorithm m selects the best Which approximators for each problem subspace approximator?
    19. Evolving the best approximator
    20. What next? REPRESENTATION Which approximator? Which Let evolution decide! representation? APPROXIMATOR Population of classifiers using different approximators to compute prediction Proble m Even if the same approximator alone Which might solve the whole problem approximator?
    21. Evolving Heterogeneous Approximators Heterogeneous Approximators Most Powerful Approximator
    22. What next? Probably done for Boolean  Allow different representations Conditions in the same populations  Let evolution evolve the most adequate representation for each problem subspace  Then, allow different representations and different approximators evolve all together
    23. Acknowledgements  Daniele Loiacono  Matteo Zanini  All the current and former members of IlliGAL
    24. Thank you! Any question?

    + Xavier LloràXavier Llorà, 6 months ago

    custom

    151 views, 0 favs, 0 embeds more stats

    Pier Luca Lanzi talks at NIGEL 2006 about computed more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 151
      • 151 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 3
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories